Re: Invoker activation queueing proposal

Markus Thömmes Tue, 10 Oct 2017 10:35:23 -0700

Heyho,

I ran into the same issue before and I think our scheduling code should be an 
Actor. We could microbenchmark it to assure it can happily schedule a large 
amount of actions per second to not become a bottleneck.


+1 for actorizing the LB

Cheers,
Markus

Von meinem iPhone gesendet

> Am 10.10.2017 um 13:28 schrieb Tyson Norris <[email protected]>:
> 
> Hi - 
> Following up on this, I’ve been working on a PR. 
> 
> One issue I’ve run into (which may be problematic in other scheduling 
> scenarios) is that the scheduling in LoadBalancerService doesn’t respect the 
> new async nature of activation counting in LoadBalancerData. At least I think 
> this is a good description. 
> 
> Specifically, I am creating a test that submits activations via 
> LoadBalancer.publish, and I end up with 10 activations scheduled on invoker0, 
> even though I use an invokerBusyThreshold of 1.
> It would only occur when concurrent requests (or very short time between?) 
> arrive at the same controller, I think. (Otherwise the counts can sync up 
> quickly enough)
> I’ll work more on testing it.
> 
> Assuming this (dealing with async counters) is the problem, I’m not exactly 
> sure how to deal with it. Some options may include:
> - change LoadBalancer to an actor, so that local counter states can be easier 
> managed (these would still need to replicate, but at least locally it would 
> do the right thing) 
> - coordinate the schedule + setupActivation calls to also rely on some local 
> state for activations that should be counted but have not yet been processed 
> within LoadBalancerData
> 
> Any suggestions in this area would be great.
> 
> Thanks
> Tyson
> 
> 
> 
>> On Oct 6, 2017, at 11:04 AM, Tyson Norris <[email protected]> wrote:
>> 
>> With many invokers, there is less data exposed to rebalancing operations, 
>> since the invoker topics will only ever receive enough activations that can 
>> be processed “immediately", currently set to 16. The single backlog topic 
>> would only be consumed by the controller (not any invoker), and the invokers 
>> would only consumer their respective “process immediately” topic - which 
>> effectively has no, or very little, backlog - 16 max. My suggestion is that 
>> having multiple backlogs is an unnecessary problem, regardless of how many 
>> invokers there are.
>> 
>> It is worth noting the case of multiple controllers as well, where multiple 
>> controllers may be processing the same backlog topic. I don’t think this 
>> should cause any more trouble than the distributed activation counting that 
>> should be enabled via controller clustering, but it may mean that if one 
>> controller enters overflow state, it should signal that ALL controllers are 
>> now in overflow state, etc.
>> 
>> Regarding “timeout”, I would plan to use the existing timeout mechanism, 
>> where an ActivationEntry is created immediately, regardless of whether the 
>> activation is going to get processed, or get added to the backlog. At time 
>> of processing the backlog message, if the entry is timed out, throw it away. 
>> (The entry map may need to be shared in the case multiple invokers are in 
>> use, and they all consume from the same topic; alternatively, we can 
>> partition the topic so that entries are only processed by the controller 
>> that has backlogged them)
>> 
>> Yes, once invokers are saturated, and backlogging begins, I think all 
>> incoming activations should be sent straight to backlog (we already know 
>> that no invokers are available). This should not hurt overall performance 
>> anymore than it currently does, and should be better (since the first 
>> invoker available can start taking work, instead of waiting on a specific 
>> invoker to become available).
>> 
>> I’m working on a PR, I think much of these details will come out there, but 
>> in the meantime, let me know if any of this doesn’t make sense.
>> 
>> Thanks
>> Tyson
>> 
>> 
>> On Oct 5, 2017, at 2:49 PM, David P Grove 
>> <[email protected]<mailto:[email protected]>> wrote:
>> 
>> 
>> I can see the value in delaying the binding of activations to invokers when 
>> the system is loaded (can't execute "immediately" on its target invoker).
>> 
>> Perhaps in ignorance, I am a little worried about the scalability of a 
>> single backlog topic. With a few hundred invokers, it seems like we'd be 
>> exposed to frequent and expensive partition rebalancing operations as 
>> invokers crash/restart. Maybe if we have N = K*M invokers, we can get away 
>> with M backlog topics each being read by K invokers. We could still get 
>> imbalance across the different backlog topics, but it might be good enough.
>> 
>> I think we'd also need to do some thinking of how to ensure that work put in 
>> a backlog topic doesn't languish there for a really long time. Once we start 
>> having work in the backlog, do we need to stop putting work in immediately 
>> topics? If we do, that could hurt overall performance. If we don't, how will 
>> the backlog topic ever get drained if most invokers are kept busy servicing 
>> their immediately topics?
>> 
>> --dave
>> 
>> Tyson Norris ---10/04/2017 07:45:38 PM---Hi - I’ve been discussing a bit 
>> with a few about optimizing the queueing that goes on ahead of invok
>> 
>> From:  Tyson Norris 
>> <[email protected]<mailto:[email protected]>>
>> To:  "[email protected]<mailto:[email protected]>" 
>> <[email protected]<mailto:[email protected]>>
>> Date:  10/04/2017 07:45 PM
>> Subject:  Invoker activation queueing proposal
>> 
>> ________________________________
>> 
>> 
>> 
>> Hi -
>> 
>> I’ve been discussing a bit with a few about optimizing the queueing that 
>> goes on ahead of invokers so that things behave more simply and predictable.
>> 
>> 
>> 
>> In short: Instead of scheduling activations to an invoker on receipt, do the 
>> following:
>> 
>> - execute the activation "immediately" if capacity is available
>> 
>> - provide a single overflow topic for activations that cannot execute 
>> “immediately"
>> 
>> - schedule from the overflow topic when capacity is available
>> 
>> 
>> 
>> (BTW “Immediately” means: still queued via existing invoker topics, but ONLY 
>> gets queued there in the case that the invoker is not fully loaded, and 
>> therefore should execute it “very soon")
>> 
>> 
>> 
>> Later: it would also be good to provide more container state data from 
>> invoker to controller, to get better scheduling options - e.g. if some 
>> invokers can handle running more containers than other invokers, that info 
>> can be used to avoid over/under-loading the invokers (currently we assume 
>> each invoker can handle 16 activations, I think)
>> 
>> 
>> 
>> I put a wiki page proposal here: 
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e&data=02%7C01%7C%7C206a28eb8f9d4f3b131608d50ce4be0f%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636429098932393323&sdata=pLsSnlJRYtL4cHMqciGBsA9kLaHzW1GjbijpJCsD1po%3D&reserved=0=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e%3D&data=02%7C01%7C%7C36a3439a232c45d2119c08d50c3b096b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636428370067764903&sdata=MBzAhIAVOdHCG0acu8YKCNmeYXO8T9PcILoQrlUyixw%3D&reserved=0>
>> 
>> 
>> 
>> WDYT?
>> 
>> 
>> 
>> Thanks
>> 
>> Tyson
>> 
>

Re: Invoker activation queueing proposal

Reply via email to