Dear Markus. Thank you for the quick response. > In the proposal, the semantics of talking to the container do not change from what we have today. If the http request fails for whatever reason while "in-flight", processing cannot be completed, just like if an invoker crashes. Note that we commit the message immediatly after reading it today, which leads to "at-most-once" semantics. OpenWhisk does not support "at-least-once" today. To do so, you'd retry from the outside.
Ah yes. Now I remember I wondered why OS doesn't support "at-least-once" semantic. This is the question apart from the new architecture, but is this because of the case that user can execute the non-idempotent action? So though an invoker is failed, still action could be executed and it could cause some side effects such as repeating the action which requires "at-most-once" semantic more than once? > I don't believe the ContainerManager needs to do that much honestly. In the Kubernetes case for instance it only asks the Kube API for pods and then keeps a list of these pods per action. Further it divides this list whenever a new container is added/removed. I think we can push this quite far in a master/slave fashion as Brendan mentioned. This is guessing though, it'll be crucial to measure the throughput that one instance actually can provide and then decide on whether that's feasible or not. > > As its state isn't moving at a super fast pace, we can probably afford to persist it into something like redis or etcd for the failover to take over if one dies. > > Of course I'm very open for scaling it out horizontally if that's achievable. Yes, reducing the requests against Docker daemon seems the right way to go. BTW, how would long warmed containers be kept in the new architecture? Is it a 1 or 2 order of magnitude in seconds? > Assuming general round-robin (or even random scheduling) in front of the controllers would even out things to a certain extent would they? > > Another probably feasible solution is to implement session stickiness or hashing as you mentioned in todays call. Another comment that was raised during today's would come into play there as well: We could change the container division algorithm to not divide evenly but to only give containers to those controllers that requested them. In conjunction with session stickiness, that could yield better load distribution results (given the session stickiness is smart enough to divide appropriately. Yes, that could an option. I concern it might cause some load imbalance among controller as well. And another question comes up, how can we keep stick session for multiple controllers and for multiple actions respectively? > Overload of a given container is determined by its concurrency limit. Today that is "1". If 1 request is active in a container, and that is the only container, we need more containers. As soon as all containers reached their maximum concurrency, we need to scale up. In the new architecture, concurrency limit is controlled by users in a per-action based way? So in case a user wants to execute the long-running action, does he configure the concurreny limit for the action? And if concurrency limit is 1, in case action container is possessed, wouldn't controllers request a container again and again? And if it only allows container creation in a synchronous way(creating one by one), couldn't it be a burden in case a user wants a huge number of(100~200) simultaneous invocations? Please bear with many questions. I am also one of the advocates who want to improve and enhance OW in a such that way. I hope my question helps to build more delicate architecture. Thanks Best regards Dominic 2018-07-19 2:16 GMT+09:00 Markus Thoemmes <markus.thoem...@de.ibm.com>: > Hi Dominic, > > thanks for your feedback, let's see... > > >1. Buffering of activation and failure handling. > > > >As of now, Kafka acts as a kind of buffer in case activation > >processing is > >a bit delayed due to some reasons such as invoker failure. > >If Kafka is only used for the overflowing case, how can it guarantee > >"at > >least once" activation processing? > >For example, if a controller receives the requests and it crashed > >before > >the activation is complete. How can other alive controllers or the > >restored > >controller handle it? > > In the proposal, the semantics of talking to the container do not change > from what we have today. If the http request fails for whatever reason > while "in-flight", processing cannot be completed, just like if an invoker > crashes. Note that we commit the message immediatly after reading it today, > which leads to "at-most-once" semantics. OpenWhisk does not support > "at-least-once" today. To do so, you'd retry from the outside. > > >2. A bottleneck in ContainerManager. > > > >Now ContainerManage has many critical logics. > >It takes care of the container lifecycle, logging, scheduling and so > >on. > >Also, it should be aware of whole container state as well as > >controller > >status and distribute containers among alive controllers. > >It might need to do some health checking to/from all controllers and > >containers(may be orchestrator such as k8s). > > > >I think in this case ContainerManager can be a bottleneck as the size > >of > >the cluster grows. > >And if we add more nodes to scale out ContainerManager or to prepare > >for > >the SPOF, then all states of ContainerManager should be shared among > >all > >nodes. > >If we take master/slave approach, the master would become a > >bottleneck at > >some point, and if we take a clustering approach, we need some > >mechanism to > >synchronize the cluster status among ContainerManagers. > >And this procedure should be done in 1 or 2 order of magnitude in > >milliseconds. > > > >Do you have anything in your mind to handle this? > > I don't believe the ContainerManager needs to do that much honestly. In > the Kubernetes case for instance it only asks the Kube API for pods and > then keeps a list of these pods per action. Further it divides this list > whenever a new container is added/removed. I think we can push this quite > far in a master/slave fashion as Brendan mentioned. This is guessing > though, it'll be crucial to measure the throughput that one instance > actually can provide and then decide on whether that's feasible or not. > > As its state isn't moving at a super fast pace, we can probably afford to > persist it into something like redis or etcd for the failover to take over > if one dies. > > Of course I'm very open for scaling it out horizontally if that's > achievable. > > >3. Imbalance among controllers. > > > >I think there could be some imbalance among controllers. > >For example, there are 3 controllers with 3, 1, and 1 containers > >respectively for the given action. > >In some case, 1 containers in the controller1 might be overloaded but > >2 > >containers in controller2 can be available. > >If the number of containers for the given action belongs to each > >controller > >varies, it could happen more easily. > >This is because controllers are not aware of the status of other > >controllers. > >So in some case, some action containers are overloaded but the others > >may > >handle just moderate requests. > >Then each controller may request more containers instead of utilizing > >existing(but in other controllers) containers, and this can lead to > >the > >waste of resources. > > Assuming general round-robin (or even random scheduling) in front of the > controllers would even out things to a certain extent would they? > > Another probably feasible solution is to implement session stickiness or > hashing as you mentioned in todays call. Another comment that was raised > during today's would come into play there as well: We could change the > container division algorithm to not divide evenly but to only give > containers to those controllers that requested them. In conjunction with > session stickiness, that could yield better load distribution results > (given the session stickiness is smart enough to divide appropriately. > > >4. How do controllers determine whether to create more containers? > > > >Let's say, a controller has only one container for the given action. > >How controllers recognize this container is overloaded and need more > >containers to create? > >If the action execution time is short, it can calculate the number of > >buffered activation for the given action. > >But the action execution time is long, let's say 1 min or 2 mins, > >then even > >though there is only 1 activation request in the buffer, the > >controller > >needs to create more containers. > >(Because subsequent activation request will be delayed for 1 or > >2mins.) > >Since we cannot know the execution time of action in advance, we may > >need a > >sort of timeout(of activation response) approach for all actions. > >But still, we cannot know how much time of execution are remaining > >for the > >given action after the timeout occurred. > >Further, if a user requests 100 or 200 concurrent invocations with a > >2 > >mins-long action, all subsequent requests will suffer from the > >latency > >overhead of timeout. > > Overload of a given container is determined by its concurrency limit. > Today that is "1". If 1 request is active in a container, and that is the > only container, we need more containers. As soon as all containers reached > their maximum concurrency, we need to scale up. > > We do the same thing today I believe. > > Does that answer your questions? (Sorry for the broken quote layout, my > mail client screws these up) > > Cheers, > Markus > >