One related question on this is: how is throttling handled? Looking at the code it looks like each controller instance will maintain its own stats (totalActivations, activationsPerNamespace) independently. Am I missing it?
Thanks Tyson > On Feb 1, 2018, at 7:17 AM, Stephen Fink <[email protected]> wrote: > > Perfect. Thanks. > > SJF > >> On Feb 1, 2018, at 10:12 AM, Markus Thoemmes <[email protected]> >> wrote: >> >> Hi Steve, >> >> fair point and that's exactly what we're trying to exploit here: While the >> loadbalancer only sees 8 slots for example on that invoker, the invoker's >> logic remains unchanged. It always sees the full-picture of its state and >> doesn't care where the load is coming from. So OpenWhisk would (given the >> load is slow enough) only provision one container for this action. Both >> loadbalancers would operate using the same hashes and and stepSizes and thus >> choose the very same invoker (given no other load in the system). >> >> Cheers. >> >> -----Stephen Fink <[email protected]> wrote: ----- >> To: [email protected] >> From: Stephen Fink <[email protected]> >> Date: 02/01/2018 03:57PM >> Subject: Re: Introducing sharding as an alternative for state sharing >> >> Hi Markus, >> >> Suppose I have a client which fires one action repeatedly in a loop. So >> the server handles one invocation of this action at a time. >> >> Under the current system, OW will provision one container for this action, >> and it will get reused for every invocation. >> >> Under horizontal sharding with factor N, it seems OW will provision N >> containers — which is sub-optimal for several reasons (cold starts, memory >> footprint, consumption of stem cells). >> >> Do you agree? >> >> If so — I wonder if there’s a hybrid strategy — suppose the load balancer >> operates via horizontal sharding, but the invoke-side logic remains >> unchanged. Then an invoker can continue to use all slots available to >> maximize container reuse, while the load balancer avoids shared state. >> >> Make sense? >> >> SJF >> >> >>> On Feb 1, 2018, at 9:25 AM, Markus Thoemmes <[email protected]> >>> wrote: >>> >>> Hi folks, >>> >>> we (Christian Bickel and I) just opened a pull-request for comments on >>> introducing a notion of sharding instead of sharing the state between our >>> controllers (loadbalancers). It also addresses a few deficiencies of the >>> old loadbalancer to remove any kinds of bottlenecks there and make it as >>> fast as possible. >>> >>> Commit message for posterity: >>> >>> The current ContainerPoolBalancer suffers a couple of problems and >>> bottlenecks: >>> >>> 1. Inconsistent state: The data-structures keeping the state for that >>> loadbalancer are not thread-safely handled, meaning there can be queuing to >>> some invokers even though there is free capacity on other invokers. >>> 2. Asynchronously shared state: Sharing the state is needed for a >>> high-available deployment of multiple controllers and for horizontal scale >>> in those. Said state-sharing makes point 1 even worse and isn't anywhere >>> fast enough to be able to efficiently schedule quick bursts. >>> 3. Bottlenecks: Getting the state from the outside (like for the >>> ActivationThrottle) is a very costly operation (at least in the shared >>> state case) and actually bottlenecks the whole invocation path. Getting the >>> current state of the invokers is a second bottleneck, where one request is >>> made to the corresponding actor for each invocation. >>> This new implementation aims to solve the problems mentioned above as >>> follows: >>> >>> 1. All state is local: There is no shared state. Resources are managed >>> through horizontal sharding. Horizontal sharding means: The invokers' slots >>> are evenly divided between the loadbalancers in existence. If we deploy 2 >>> loadbalancers and each invoker has 16 slots, each of the loadbalancers will >>> have access to 8 slots on each invoker. >>> 2. Slots are given away atomically: When scheduling an activation, the slot >>> is immediately assigned to that activation (implemented through >>> Semaphores). That means: Even in concurrent schedules, there will not be an >>> overload on an invoker as long as there is capacity left on that invoker. >>> 3. Asynchronous updates of slow data: Slowly changing data, like a change >>> in the invoker's state, is asynchronously handled and updated to a local >>> version of the state. Querying the state is as cheap as it can be. >>> >>> A few words on the implementation details: >>> >>> We chose to use horizontal sharding (evenly dividing the capacity of each >>> invoker) vs. vertical sharding (evenly dividing the invokers as a whole) >>> for the sake of staging these changes mainly. Once we divide vertically, >>> we'll need a good loadbalancing strategy in front of our controllers >>> themselves, to keep unnecessary cold-starts to a minimum and maximize >>> container reuse. By dividing horizontally, we maintain the same reuse >>> policies as today and can even keep the same loadbalancing strategies >>> intact. Horizontal sharding of course only scales so far (maybe to 4 >>> controllers, assuming 16 slots on each invoker) but it will give us time to >>> figure out good strategies for vertical sharding and learn along the way. >>> For vertical sharding to work, it will also be crucial to have the >>> centralized overflow queue to be able to offload work between shards >>> through workstealing. All in all: Vertical sharding is a much bigger change >>> than horizontal sharding. >>> >>> We tried to implement everything in a single actor first, but that seemed >>> to impose a bottleneck again. Note that this is very frequented code, it >>> needs to be very tight. That might not match the actor model too well. >>> >>> Subsuming everything: This keeps all proposed changes intact (most notably >>> Tyson's parallel executions and overloading queue). >>> >>> A note on the gains made by this: Our non-blocking invoke performance is >>> now quite close to the raw Kafka produce performance that we have in the >>> system (not that it's good in itself, but that's the next theoretical >>> bottleneck). Before the changes, this was roughly bottlenecked to 4k >>> requests per second on a sufficiently powerful machine. Blocking invoke >>> performance was roughly doubled. >>> >>> Any comments, thoughts? >>> >>> Cheers, >>> Markus >>> >> >> >
