With many invokers, there is less data exposed to rebalancing operations, since the invoker topics will only ever receive enough activations that can be processed “immediately", currently set to 16. The single backlog topic would only be consumed by the controller (not any invoker), and the invokers would only consumer their respective “process immediately” topic - which effectively has no, or very little, backlog - 16 max. My suggestion is that having multiple backlogs is an unnecessary problem, regardless of how many invokers there are.
It is worth noting the case of multiple controllers as well, where multiple controllers may be processing the same backlog topic. I don’t think this should cause any more trouble than the distributed activation counting that should be enabled via controller clustering, but it may mean that if one controller enters overflow state, it should signal that ALL controllers are now in overflow state, etc. Regarding “timeout”, I would plan to use the existing timeout mechanism, where an ActivationEntry is created immediately, regardless of whether the activation is going to get processed, or get added to the backlog. At time of processing the backlog message, if the entry is timed out, throw it away. (The entry map may need to be shared in the case multiple invokers are in use, and they all consume from the same topic; alternatively, we can partition the topic so that entries are only processed by the controller that has backlogged them) Yes, once invokers are saturated, and backlogging begins, I think all incoming activations should be sent straight to backlog (we already know that no invokers are available). This should not hurt overall performance anymore than it currently does, and should be better (since the first invoker available can start taking work, instead of waiting on a specific invoker to become available). I’m working on a PR, I think much of these details will come out there, but in the meantime, let me know if any of this doesn’t make sense. Thanks Tyson On Oct 5, 2017, at 2:49 PM, David P Grove <[email protected]<mailto:[email protected]>> wrote: I can see the value in delaying the binding of activations to invokers when the system is loaded (can't execute "immediately" on its target invoker). Perhaps in ignorance, I am a little worried about the scalability of a single backlog topic. With a few hundred invokers, it seems like we'd be exposed to frequent and expensive partition rebalancing operations as invokers crash/restart. Maybe if we have N = K*M invokers, we can get away with M backlog topics each being read by K invokers. We could still get imbalance across the different backlog topics, but it might be good enough. I think we'd also need to do some thinking of how to ensure that work put in a backlog topic doesn't languish there for a really long time. Once we start having work in the backlog, do we need to stop putting work in immediately topics? If we do, that could hurt overall performance. If we don't, how will the backlog topic ever get drained if most invokers are kept busy servicing their immediately topics? --dave Tyson Norris ---10/04/2017 07:45:38 PM---Hi - I’ve been discussing a bit with a few about optimizing the queueing that goes on ahead of invok From: Tyson Norris <[email protected]<mailto:[email protected]>> To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: 10/04/2017 07:45 PM Subject: Invoker activation queueing proposal ________________________________ Hi - I’ve been discussing a bit with a few about optimizing the queueing that goes on ahead of invokers so that things behave more simply and predictable. In short: Instead of scheduling activations to an invoker on receipt, do the following: - execute the activation "immediately" if capacity is available - provide a single overflow topic for activations that cannot execute “immediately" - schedule from the overflow topic when capacity is available (BTW “Immediately” means: still queued via existing invoker topics, but ONLY gets queued there in the case that the invoker is not fully loaded, and therefore should execute it “very soon") Later: it would also be good to provide more container state data from invoker to controller, to get better scheduling options - e.g. if some invokers can handle running more containers than other invokers, that info can be used to avoid over/under-loading the invokers (currently we assume each invoker can handle 16 activations, I think) I put a wiki page proposal here: https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Fe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk&m=UE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM&s=LD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8&e=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e%3D&data=02%7C01%7C%7C36a3439a232c45d2119c08d50c3b096b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636428370067764903&sdata=MBzAhIAVOdHCG0acu8YKCNmeYXO8T9PcILoQrlUyixw%3D&reserved=0> WDYT? Thanks Tyson
