I can see the value in delaying the binding of activations to invokers when the system is loaded (can't execute "immediately" on its target invoker).
Perhaps in ignorance, I am a little worried about the scalability of a single backlog topic. With a few hundred invokers, it seems like we'd be exposed to frequent and expensive partition rebalancing operations as invokers crash/restart. Maybe if we have N = K*M invokers, we can get away with M backlog topics each being read by K invokers. We could still get imbalance across the different backlog topics, but it might be good enough. I think we'd also need to do some thinking of how to ensure that work put in a backlog topic doesn't languish there for a really long time. Once we start having work in the backlog, do we need to stop putting work in immediately topics? If we do, that could hurt overall performance. If we don't, how will the backlog topic ever get drained if most invokers are kept busy servicing their immediately topics? --dave From: Tyson Norris <[email protected]> To: "[email protected]" <[email protected]> Date: 10/04/2017 07:45 PM Subject: Invoker activation queueing proposal Hi - I’ve been discussing a bit with a few about optimizing the queueing that goes on ahead of invokers so that things behave more simply and predictable. In short: Instead of scheduling activations to an invoker on receipt, do the following: - execute the activation "immediately" if capacity is available - provide a single overflow topic for activations that cannot execute “immediately" - schedule from the overflow topic when capacity is available (BTW “Immediately” means: still queued via existing invoker topics, but ONLY gets queued there in the case that the invoker is not fully loaded, and therefore should execute it “very soon") Later: it would also be good to provide more container state data from invoker to controller, to get better scheduling options - e.g. if some invokers can handle running more containers than other invokers, that info can be used to avoid over/under-loading the invokers (currently we assume each invoker can handle 16 activations, I think) I put a wiki page proposal here: https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Fe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk&m=UE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM&s=LD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8&e= WDYT? Thanks Tyson
