bdoyle0182 commented on issue #5325:
URL: https://github.com/apache/openwhisk/issues/5325#issuecomment-1247467187
Actually it's a bit simpler and an unhandled failure case in the paused
state as this is the only place that could return an etcd error in the
FunctionPullingContainerProxy
```
case Event(StateTimeout, data: WarmData) =>
(for {
count <- getLiveContainerCount(data.invocationNamespace,
data.action.fullyQualifiedName(false), data.revision)
(warmedContainerKeepingCount, warmedContainerKeepingTimeout) <-
getWarmedContainerLimit(
data.invocationNamespace)
} yield {
logging.info(
this,
s"Live container count: ${count}, warmed container keeping count
configuration: ${warmedContainerKeepingCount} in namespace:
${data.invocationNamespace}")
if (count <= warmedContainerKeepingCount) {
Keep(warmedContainerKeepingTimeout)
} else {
Remove
}
}).pipeTo(self)
stay
```
the state times out. The query to etcd fails which pipes the failure message
to itself. However the message is uncaught in the state so it ends up getting
stashed with this and now the container sits around waiting indefinitely until
a new activation comes in.
`case _ => delay`
New activation comes in and tries to wake up the warmed container playing
out all of the events I described in my previous messages starting with
attempting to unpause the container. It then transitions to `Running` and
onTransition it unstashes the failure message to etcd. So we just need to put
proper failure handling on the state timeout event for paused. For additional
confirmation this is exactly what's happening, the gap in logs between when it
was paused and when the new activation comes in is much greater than the
idleTimeout.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]