Hi Markus, I like the approach you have taken. Here’s a more general comment (relevant not to only this PR): The stated motivation for the PR is to improve performance. I have no doubt that the (average) performance of action invocation is increased by this PR for most relevant traffic patterns one would see in real life. However, I think you make some implicit assumptions on what these traffic patterns look like (and also what the deployed topology looks like).
Which brings me to the actual comment :) - it would be great if there was also a performance test case that simulates the traffic patterns you have in mind. That would make it easier to discuss the improvement. (e.g. Some test like your repo [1]) (again, I have no doubt in this case that the PR will help performance) my2c Michael [1] https://github.com/markusthoemmes/openwhisk-performance/pull/1 From: Markus Thömmes <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Monday 12 June 2017 20:20 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Loadbalancer Improvements Hey folks, it's me again with the latest news on performance :). As some of you probably now: Our current loadbalancer strategy is quite "simple" and doesn't take load in the system into account at all. It hops to the next available invoker after you've invoked an action X times (where X is a fixed value defined at deployment time). For many many cases that's suboptimal behavior and induces lots of cold-starts, even in a fairly unused system. To improve on this here is a proposal to take the loadbalancer state we already have and make something out of it. In a nutshell, the plan is: Before you schedule to an invoker, take into account how much load is on the invoker you want to schedule to. If it seems full already (determined by outstanding active-ack responses) search for another invoker. Via hashing, we define a home invoker to for every subject/action combination. That is the invoker with the highest probability of having a warm container for that action. If that invoker is already busy, choose another invoker. "Stepping" through the invokers should be stable as well, as in: For a given subject/action it should always try the invokers in the same order. That way, the probability of getting a warm container is higher than if we chose randomly, but of course it gets lower the more "hops" you need to make. The step-width is determined via hashing into a series of coprime numbers to the amount of invokers in the system to minimize collisions and chasing. The proposal is expected to lead to a more stable warm-container rate and lead to a better utilization of the system as a whole. I already took a stab at implementing the proposal above. The pull-request can be found here: https://github.com/apache/incubator-openwhisk/pull/2360 As always, comments, objections, praise. All feedback is very welcome :) Cheers, Markus
