bdoyle0182 commented on code in PR #5386:
URL: https://github.com/apache/openwhisk/pull/5386#discussion_r1127250263
##########
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:
##########
@@ -1201,6 +1207,13 @@ object MemoryQueue {
logging.info(
this,
s"[$invocationNamespace:$action:$stateName] some activations are stale
msg: ${queue.head.msg.activationId}.")
+ val currentTime = Instant.now.toEpochMilli
+ if (currentTime - lastActivationExecutedTime.get() > maxRetentionMs) {
+ MetricEmitter.emitGaugeMetric(
+ LoggingMarkers
+ .SCHEDULER_QUEUE_NOT_PROCESSING(invocationNamespace,
action.asString, action.toStringWithoutVersion),
Review Comment:
For this metric, it should not fire for the case where an activation request
does not arrive because there are not enough containers to adequately meet the
throughput of how many activations are in the queue. In such a case _some_
activations should be making progress in which case it won't pass the check
that the last activation executed time is greater than the retention timeout of
the queue. The value of the metric is thus a boolean 1 or 0, either the action
is having problems or it's not.
I can try to think of if there's any additional data we should be emitting
when hitting the case, but in my comment above I think I also have a bug fix
where a log emitting the state of the queue isn't properly getting logged when
activations are timed out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]