Re: Backpressure for slow activation storage in Invoker

Rodric Rabbah Thu, 20 Jun 2019 18:38:14 -0700

Overflowing to Kafka (option b) is better. Actually I would dump all the 
activations there and have a separate process to drain that Kafka topic to the 
datastore or logstore.


There is another approach of routing the logs directly to a logstore without 
going through the invoker at all. IBM may have experimented with this maybe 
someone else can comment on that. 

-r

> On Jun 20, 2019, at 2:20 AM, Chetan Mehrotra <[email protected]> 
> wrote:
> 
> Hi Team,
> 
> When rate of activation is high (specially with concurrency enabled)
> in a specific invoker then its possible that rate of storage of
> activation in ArtifactStore lags behind rate of activation record
> generation.
> 
> For CouchDB this was somewhat mitigated by using a Batcher
> implementation which internally used CouchDB bulk insert api
> (#2812)[1]. However currently Batcher is configured with a queue size
> of Int.max [2] which can potentially lead to Invoker going OOM
> 
> We tried to implement a similar support for CosmosDB (#4513)[3]. With
> our test we see that even with queue size of 100k was getting filled
> up quickly with higher load.
> 
> For #2812 Rodric had mentioned the need to support backpressure [4]
> 
>> we should perhaps open an issue to refactor the relevant code so that we can 
>> backpressure the invoker feed when the activations can't be drained fast 
>> enough.
> 
> Currently the storeActivation call is not waited upon in
> ContainerProxy and hence there is no backpressure. Wanted to check on
> what possible options we can try if activations can't be drained fast
> enough.
> 
> Option A - Wait till enqueue
> -------------------------------------
> 
> Somehow when calling storeACtivation wait till calls gets "enqueued".
> If it gets rejected due to queue full (assuming here that
> ArtifactStore has a queued storage) then we wait and retry few times.
> If it gets queued then we simply complete the call.
> 
> With this we would not be occupying the invoker slot untill storage is
> complete. Instead we just occupy bit more till activations get
> enqueued to in memory buffer
> 
> Option B - Overflow to Kafka and new OverflownActivationRecorderService
> ----------------------------------------------------------------------------------------------------
> 
> With enqueuing the activations there is always a chance of increase
> the heap pressure specially if activation size is large (user
> controlled aspect). So another option would be to overflow to Kafka
> for storing activation.
> 
> If internal queue is full (queue size can now be small) then we would
> enque the record to Kafka. Kafka would in general have higher
> throughput rate compared to normal storage.
> 
> Then on other end we can have a new micro service which would poll
> this "overflowActivations" topic and then persist them in
> ArtifactStore. Here we can even use single but partitioned topic if
> need to scale out the queue processing by multiple service instances
> if needed.
> 
> Any feedback on possible option to pursue would be helpful!
> 
> Chetan Mehrotra
> [1]: https://github.com/apache/incubator-openwhisk/pull/2812
> [2]: 
> https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/org/apache/openwhisk/core/database/Batcher.scala#L56
> [3]: https://github.com/apache/incubator-openwhisk/pull/4513
> [4]: 
> https://github.com/apache/incubator-openwhisk/pull/2812#pullrequestreview-67378126

Re: Backpressure for slow activation storage in Invoker

Reply via email to