bdoyle0182 opened a new issue, #5322:
URL: https://github.com/apache/openwhisk/issues/5322

   The new scheduler will send a container creation message from the scheduler 
to the invoker over kafka. When this message is consumed in the invoker, it 
immediately does a get from the artifact db to retrieve the function. However 
container creation messages can be sent even when there is already an available 
warm container to execute the function, in the code this is referred to as a 
`warmedCreation`. This can occur when there is a container in the paused state 
or one of the fsm's has gone into a paused / idle state and needs to be woken 
up.
   
   The problem here is that if for every container creation message, the db is 
attempted to be hit to download the function to the invoker; it's possible to 
have to download the function if the cache has been invalidated. This is 
suboptimal since if we know the action name and revision to execute and there's 
a matching container; that should be all the information we need to deliver the 
warm execution. The download can be very expensive if the function attachment 
is 10's of mb's.
   
   I believe the problem is exacerbated on the new scheduler because of two 
things. 1. There is no longer a concept of a home invoker for it to be likely 
for the function to constantly be refreshed in the cache and 2. the cache is 
now only refreshed on cold starts / container creation messages. On the old 
scheduling algorithm, the db hit attempt for the function occurs for every 
activation received and the cache invalidation time is access based so the 
timeout clock is refreshed on every activation within an invoker.
   
   Take for example the configuration of:
   `pauseGrace=5 minutes`
   `idleTimeout=10 minutes`
   
    The cache timeout is hardcoded to 5 minutes and at this point not 
configurable.
    
    So cold start occurs and function is first downloaded. No other execution 
comes in for five minutes, the container is paused. Right around this time, the 
cache invalidates the entry at minute five since it hasn't been refreshed.  At 
minute 8 a new activation comes in and the scheduler decides to send a warmed 
container creation message to wake up the container that has been paused but 
still exists. The invoker consumes the message and attempts to get the function 
from the cache, but it's been invalidated and has to be re-downloaded.
   
   A simple solution to reduce the possibility of this is to simply increase 
the cache invalidation time to something like 1 hour, however that doesn't 
solve the problem as the cache is only restarted on cold starts.
   
   Let's look at the same example from above again but instead of it executing 
only once; let's say the function takes 50ms to run and an execute request is 
sent every 75ms. The initial cold start occurs creating the cache entry and 
then the scheduler reuses this same container every 75ms for two hours. During 
this time the cache is never refreshed due to the nature of the new scheduler. 
At the one hour mark the cache is invalidated. At the two hour mark there is a 
five minute gap in function execution and the container is paused. Then 
execution begins after five minutes and the first execution with waking up the 
container has to re-download the function attachment.
   
   So the real solution here imo is to take function loading off the critical 
path of a warm execution in all cases which I think should be doable. Or if 
that's not possible and the function metadata may need to be loaded; then 
optimize to not load the attachment after getting the action document because 
the db action document itself is guaranteed to be <1mb and latency for that 
should be ms latency.
   
   This is for the most part a tp99 problem so not too bad, but it can really 
be detrimental to large function packages.
   
   As a side issue that I realized upon investigating this issue, the 
`getDocument` metric stops recording after the document is retrieved before the 
post processing to download the attachment. We either need a separate metric to 
download the attachment or include it as a part of the `getDocument` recording. 
The `getDocument` metric displaying 20ms tp99 made it take a long time for me 
to realize what was really happening here.
    
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to