style95 commented on issue #5256:
URL: https://github.com/apache/openwhisk/issues/5256#issuecomment-1145878594

   Let me share the current behavior and my opinion.
   
   The new scheduler is designed based on the assumption that latency is the 
most important.
   Some users of our downstream were sensitive to the latency. Some of them 
were even concerned about the hundreds of milliseconds of the wait time.
   And we didn't want to accept a few seconds of the wait time.
   
   Based on this idea, let me share how the new scheduler works.
   First, the scheduler will look for the average duration of the given action.
   Once the action is invoked at least one time, there will be activations and 
we can figure out the average duration.
   This is handled by `ElasticSearchDurationChecker`. When a memory queue 
starts up, it will try to get the average duration.
   Once we get the duration, we can estimate the processing power of one 
container for the given action.
   
   For example, if the duration is 10ms, theoretically one container can handle 
100 activations in 1 second while it will handle only 1 activation for an 
action with 1s duration. We can easily calculate the required number of 
containers.
   
   If an action had never been invoked, we can't figure out the average 
duration and the scheduler will just create one container.
   If the action quickly finishes, then we can figure out the average duration 
again and it would work accordingly.
   On the other hand, if it takes more than the scheduling interval(100ms) to 
finish, Then the duration of action is at least bigger than 100ms, it could 
take 1s ~ 10s. Since we have no idea yet the scheduler will add the same number 
of containers with the number of stale activations in the queue.
   This is where staleness is introduced.
   
   One more thing to consider is that even for short-running actions, some 
activations can be stale.
   We properly calculated the required number of containers but the duration 
can vary and some messages can be stale while some containers are running.
   It stands for existing containers are not enough to handle existing 
activations.
   Let's say 10 activations are incoming every 100 milliseconds and existing 
containers could only handle 7 activations during 100ms, we need to add more 
containers. So we calculate the required number of additional containers based 
on the number of stale activations and the average duration.
   
   ```scala
   val containerThroughput = StaleThreshold / duration
   val num = ceiling(availableMsg.toDouble / containerThroughput)
   ```
   
   Also, if the calculated `num` is 5 while there are only 3 activations in the 
queue, we don't need to add 5 containers as 2 of them will be idle.
   So we only add 3 containers. Considering the fact that this case can 
repeatedly happen because container creation generally takes more than 100ms, 
we should take the number of in-progress(being created) containers into account.
   
   ```scala
   val actualNum = (if (num > availableMsg) availableMsg else num) - inProgress
   ```
   
   This is basically how the new scheduler works.
   
   Now the issue is, in the case of long-running actions such as 10s, since its 
processing power would be 0.01(100ms/10s), the scheduler will try to create the 
same number of containers with the number of activations.
   So when 10 activations come, it will try to create 10 containers to handle 
them. When 100 activations come, it will create 100 containers.
   (while at the beginning it will only add one container as an initial 
container.)
   So this could end up spending all resources, we have to properly throttle 
them with the namespace limit.
   If the namespace limit is 30, then only 30 containers will be created and 70 
activations will be waiting in the queue.
   Only after 40 seconds, after 4 rounds, all activations will be handled.
   
   This is an example, but we thought the 40s of the wait time is too much and 
we wanted to minimize wait time no matter which kind of action is running.
   But this could create a huge number of containers within a short period of 
time and it could overload the Docker engine or a K8S API server.
   Also, if one action spawns a huge number of containers, it would affect 
other actions too as the Docker engine would be busy creating them.
   
   Regarding the idea to increase the staleness threshold, I am not sure. Some 
users still may want the short wait time even if their actions are long-running 
actions. 
   Maybe we can introduce another throttling for container creation and it 
should consider the fairness among actions.
   Also on the invoker side, it should create containers in batch with a limit 
on the number of containers in each batch.
   (The Docker client already has such a batch but the K8S client doesn't.)
   
   And it would be great for OW operators if we can control the 
aggressiveness(whether to create containers more or less aggressively) of the 
scheduler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to