Hi Dominic, Thanks for sharing your ideas. IIUC (and pls keep me honest), the goal of the new design is to improve activation performance. I personally love this; performance is a critical non-functional feature of any FaaS system.
There’s something I’d like to call out: the management of containers in a FaaS system could be compared to a JVM. A JVM allocates objects in memory, and GC them. A FaaS system allocates containers to run actions, and it GCs them when they become idle. If we could look at OW's scheduling from this perspective, we could reuse the proven patterns in the JVM vs inventing something new. I’d be interested on any GC implications in the new design, meaning how idle actions get removed, and how is that orchestrated. Thanks, dragos On Thu, Apr 4, 2019 at 8:40 AM Matt Sicker <[email protected]> wrote: > Would it make sense to define an OpenWhisk Improvement/Enhancement > Propoposal or similar that various other Apache projects do? We could > call them WHIPs or something. :) > > On Thu, 4 Apr 2019 at 09:09, David P Grove <[email protected]> wrote: > > > > > > Dominic Kim <[email protected]> wrote on 04/04/2019 04:37:19 AM: > > > > > > I have proposed a new architecture. > > > https://cwiki.apache.org/confluence/display/OPENWHISK/New+architecture > > +proposal > > > > > > It includes many controversial agendas and breaking changes. > > > So I would like to form a general consensus on them. > > > > > > > Hi Dominic, > > > > There's much to like about the proposal. Thank you for writing > it > > up. > > > > One meta-comment is that the work will have to be done in a way > so > > there are no actual "breaking changes". It has to be possible to > continue > > to configure the system using the existing architectures while this work > > proceeds. I would expect this could be done via a new LoadBalancer and > > some deployment options (similar to how Lean OpenWhisk was done). If > work > > needs to be done to generalize the LoadBalancer SPI, that could be done > > early in the process. > > > > On the proposal itself, I wonder if the complexity of > Leader/Follower > > is actually needed? If a Scheduler crashes, it could be restarted and > then > > resume handling its assigned load. I think there should be enough > > information in etcd for it to recover its current set of assigned > > ContainerProxys and carry on. Activations in its in memory queues would > > be lost (bigger blast radius than the current architecture), but I don't > > see that the Leader/Follower changes that (seems way too expensive to be > > replicating every activation in the Follower Queues). The > Leader/Follower > > would allow for shorter downtime for those actions assigned to the downed > > Scheduler, but at the cost of significant complexity. Is it worth it? > > > > Perhaps related to the Leader/Follower, its not clear to me how > > activation messages are being pulled from the action topic in Kafka > during > > the Queue creation window. I think they have to go somewhere (because the > > is a mix of actions on a single Kafka topic and we can't stall other > > actions while waiting for a Queue to be created for a new action), but if > > you don't know yet which Scheduler is going to win the race to be a > Leader > > how do you know where to put them? > > > > --dave > > > > -- > Matt Sicker <[email protected]> >
