[GitHub] [openwhisk] bdoyle0182 commented on issue #5325: [New Scheduler] Container unpausing results in key remaining in etcd for deleted container

GitBox Wed, 14 Sep 2022 18:33:53 -0700


bdoyle0182 commented on issue #5325:
URL: https://github.com/apache/openwhisk/issues/5325#issuecomment-1247467187


   Actually it's a bit simpler and an unhandled failure case in the paused 
state as this is the only place that could return an etcd error in the 
FunctionPullingContainerProxy
   
   ```
       case Event(StateTimeout, data: WarmData) =>
         (for {
           count <- getLiveContainerCount(data.invocationNamespace, 
data.action.fullyQualifiedName(false), data.revision)
           (warmedContainerKeepingCount, warmedContainerKeepingTimeout) <- 
getWarmedContainerLimit(
             data.invocationNamespace)
         } yield {
           logging.info(
             this,
             s"Live container count: ${count}, warmed container keeping count 
configuration: ${warmedContainerKeepingCount} in namespace: 
${data.invocationNamespace}")
           if (count <= warmedContainerKeepingCount) {
             Keep(warmedContainerKeepingTimeout)
           } else {
             Remove
           }
         }).pipeTo(self)
         stay 
   ```
   
   the state times out. The query to etcd fails which pipes the failure message 
to itself. However the message is uncaught in the state so it ends up getting 
stashed with this and now the container sits around waiting indefinitely until 
a new activation comes in.
   
   `case _ => delay`
   
   New activation comes in and tries to wake up the warmed container playing 
out all of the events I described in my previous messages starting with 
attempting to unpause the container. It then transitions to `Running` and 
onTransition it unstashes the failure message to etcd. So we just need to put 
proper failure handling on the state timeout event for paused. For additional 
confirmation this is exactly what's happening, the gap in logs between when it 
was paused and when the new activation comes in is much greater than the 
idleTimeout.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [openwhisk] bdoyle0182 commented on issue #5325: [New Scheduler] Container unpausing results in key remaining in etcd for deleted container

Reply via email to