Hi Markus, Currently one aspect which is not clear is does Controller has access to
1. Pool of prewarm containers - Container of base image where /init is yet not done. So these containers can then be initialized within Controller 2. OR Pool of warm container bound to specific user+action. These containers would possibly have been initialized by ContainerManager and then it allocates them to controller. > The scaleup model stays exactly the same as today! If you have 200 > simultaneous invocations (assuming a per-container concurrency limit of 1) we > will create 200 containers to handle that load (given the requests are truly > simultaneous --> arrive at the same time). Containers are NOT created in a > synchronous way and there's no need to sequentialize their creation. Does > something in the proposal hint to that? If so, we should fix that immediately. Can you elaborate this bit more i.e. how scale up logic would work and is asynchronous? I think above aspect (type of pool) would have bearing on scale up logic. If an action was not in use so far then when first request comes (i.e. 0-1 scale up case) would Controller ask ContainerManager for specific action container and then wait for its setup and then execute it. OR if it has a generic pool then it takes one and initializes it and use it. And if its not done synchronously then would such an action be put to overflow queue. Chetan Mehrotra On Thu, Jul 19, 2018 at 2:39 PM Markus Thoemmes <markus.thoem...@de.ibm.com> wrote: > > Hi Dominic, > > >Ah yes. Now I remember I wondered why OS doesn't support > >"at-least-once" > >semantic. > >This is the question apart from the new architecture, but is this > >because > >of the case that user can execute the non-idempotent action? > >So though an invoker is failed, still action could be executed and it > >could > >cause some side effects such as repeating the action which requires > >"at-most-once" semantic more than once? > > Exactly. Once we pass the HTTP request into the container, we cannot know > whether the action has already caused a side-effect. At that point it's not > safe to retry (hence /run doesn't allow for retries vs. /init does) and in > doubt we need to abort. > We could imagine the user to state idempotency of an action so it's safe for > us to retry, but that's a different can of worms and imho unrelated to the > architecture as you say. > > >BTW, how would long warmed containers be kept in the new > >architecture? Is > >it a 1 or 2 order of magnitude in seconds? > > I don't see a reason to change this behavior from what we have today. Could > be configurable and potentially be hours. The only concerns are: > - Scale-down of worker nodes is inhibited if we keep containers around a long > time --> costs the vendor money > - If the system is full with warm containers and we want to evict one to make > space for a different container, removing and recreating a container is more > expensive than just creating. > > >In the new architecture, concurrency limit is controlled by users in > >a > >per-action based way? > > That's not necessarily architecture related, but Tyson is implementing this, > yes. Note that this is "concurrency per container" not "concurrency per > action" (which could be a second knob to turn). > > In a nutshell: > - concurrency per container: The amount of parallel HTTP requests allowed for > a single container (this is what Tyson is implementing) > - concurrency per action: You could potentially limit the maximum amount of > concurrent invocations running for each action (which is distinct from the > above, because this could mean to limit the amount of containers created vs. > limiting the amount of parallel HTTP requests to a SINGLE container) > > >So in case a user wants to execute the long-running action, does he > >configure the concurreny limit for the action? > > Long running isn't related to concurrency I think. > > > > >And if concurrency limit is 1, in case action container is possessed, > >wouldn't controllers request a container again and again? > >And if it only allows container creation in a synchronous > >way(creating one > >by one), couldn't it be a burden in case a user wants a huge number > >of(100~200) simultaneous invocations? > > The scaleup model stays exactly the same as today! If you have 200 > simultaneous invocations (assuming a per-container concurrency limit of 1) we > will create 200 containers to handle that load (given the requests are truly > simultaneous --> arrive at the same time). Containers are NOT created in a > synchronous way and there's no need to sequentialize their creation. Does > something in the proposal hint to that? If so, we should fix that immediately. > > No need to apologize, this is great engagement, exactly what we need here. > Keep it up! > > Cheers, > Markus >