With many invokers, there is less data exposed to rebalancing operations, since 
the invoker topics will only ever receive enough activations that can be 
processed “immediately", currently set to 16. The single backlog topic would 
only be consumed by the controller (not any invoker), and the invokers would 
only consumer their respective “process immediately” topic - which effectively 
has no, or very little, backlog - 16 max. My suggestion is that having multiple 
backlogs is an unnecessary problem, regardless of how many invokers there are.

It is worth noting the case of multiple controllers as well, where multiple 
controllers may be processing the same backlog topic. I don’t think this should 
cause any more trouble than the distributed activation counting that should be 
enabled via controller clustering, but it may mean that if one controller 
enters overflow state, it should signal that ALL controllers are now in 
overflow state, etc.

Regarding “timeout”, I would plan to use the existing timeout mechanism, where 
an ActivationEntry is created immediately, regardless of whether the activation 
is going to get processed, or get added to the backlog. At time of processing 
the backlog message, if the entry is timed out, throw it away. (The entry map 
may need to be shared in the case multiple invokers are in use, and they all 
consume from the same topic; alternatively, we can partition the topic so that 
entries are only processed by the controller that has backlogged them)

Yes, once invokers are saturated, and backlogging begins, I think all incoming 
activations should be sent straight to backlog (we already know that no 
invokers are available). This should not hurt overall performance anymore than 
it currently does, and should be better (since the first invoker available can 
start taking work, instead of waiting on a specific invoker to become 
available).

I’m working on a PR, I think much of these details will come out there, but in 
the meantime, let me know if any of this doesn’t make sense.

Thanks
Tyson


On Oct 5, 2017, at 2:49 PM, David P Grove 
<[email protected]<mailto:[email protected]>> wrote:


I can see the value in delaying the binding of activations to invokers when the 
system is loaded (can't execute "immediately" on its target invoker).

Perhaps in ignorance, I am a little worried about the scalability of a single 
backlog topic. With a few hundred invokers, it seems like we'd be exposed to 
frequent and expensive partition rebalancing operations as invokers 
crash/restart. Maybe if we have N = K*M invokers, we can get away with M 
backlog topics each being read by K invokers. We could still get imbalance 
across the different backlog topics, but it might be good enough.

I think we'd also need to do some thinking of how to ensure that work put in a 
backlog topic doesn't languish there for a really long time. Once we start 
having work in the backlog, do we need to stop putting work in immediately 
topics? If we do, that could hurt overall performance. If we don't, how will 
the backlog topic ever get drained if most invokers are kept busy servicing 
their immediately topics?

--dave

Tyson Norris ---10/04/2017 07:45:38 PM---Hi - I’ve been discussing a bit with a 
few about optimizing the queueing that goes on ahead of invok

From:  Tyson Norris 
<[email protected]<mailto:[email protected]>>
To:  "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date:  10/04/2017 07:45 PM
Subject:  Invoker activation queueing proposal

________________________________



Hi -

I’ve been discussing a bit with a few about optimizing the queueing that goes 
on ahead of invokers so that things behave more simply and predictable.



In short: Instead of scheduling activations to an invoker on receipt, do the 
following:

- execute the activation "immediately" if capacity is available

- provide a single overflow topic for activations that cannot execute 
“immediately"

- schedule from the overflow topic when capacity is available



(BTW “Immediately” means: still queued via existing invoker topics, but ONLY 
gets queued there in the case that the invoker is not fully loaded, and 
therefore should execute it “very soon")



Later: it would also be good to provide more container state data from invoker 
to controller, to get better scheduling options - e.g. if some invokers can 
handle running more containers than other invokers, that info can be used to 
avoid over/under-loading the invokers (currently we assume each invoker can 
handle 16 activations, I think)



I put a wiki page proposal here: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Fe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk&m=UE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM&s=LD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8&e=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e%3D&data=02%7C01%7C%7C36a3439a232c45d2119c08d50c3b096b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636428370067764903&sdata=MBzAhIAVOdHCG0acu8YKCNmeYXO8T9PcILoQrlUyixw%3D&reserved=0>



WDYT?



Thanks

Tyson

Reply via email to