bdoyle0182 commented on code in PR #5386:
URL: https://github.com/apache/openwhisk/pull/5386#discussion_r1127250263


##########
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:
##########
@@ -1201,6 +1207,13 @@ object MemoryQueue {
       logging.info(
         this,
         s"[$invocationNamespace:$action:$stateName] some activations are stale 
msg: ${queue.head.msg.activationId}.")
+      val currentTime = Instant.now.toEpochMilli
+      if (currentTime - lastActivationExecutedTime.get() > maxRetentionMs) {
+        MetricEmitter.emitGaugeMetric(
+          LoggingMarkers
+            .SCHEDULER_QUEUE_NOT_PROCESSING(invocationNamespace, 
action.asString, action.toStringWithoutVersion),

Review Comment:
   For this metric, it should not fire for the case where an activation request 
does not arrive because there are not enough containers to adequately meet the 
throughput of how many activations are in the queue. In such a case _some_ 
activations should be making progress in which case it won't pass the check 
that the last activation executed time is greater than the retention timeout of 
the queue. The value of the metric is thus a boolean 1 or 0, either the action 
is having problems or it's not.
   
   I can try to think of if there's any additional data we should be emitting 
when hitting the case, but in my comment above I think I also have a bug fix 
where a log emitting the state of the queue isn't properly getting logged when 
activations are timed out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to