[jira] [Work logged] (GOBBLIN-1452) Add meters for successful/failed dags in total and by flowGroup

ASF GitHub Bot (Jira) Wed, 26 May 2021 18:31:11 -0700


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-1452?focusedWorklogId=602694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602694
 ]


ASF GitHub Bot logged work on GOBBLIN-1452:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/May/21 01:30
            Start Date: 27/May/21 01:30
    Worklog Time Spent: 10m 
      Work Description: jack-moseley commented on a change in pull request 
#3290:
URL: https://github.com/apache/gobblin/pull/3290#discussion_r640226670



##########
File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
##########
@@ -460,6 +463,10 @@ public synchronized void setActive(boolean active) {
         this.jobStatusPolledTimer = 
Optional.of(this.metricContext.timer(ServiceMetricNames.JOB_STATUS_POLLED_TIMER));
         ContextAwareGauge<Long> orchestrationDelayMetric = 
metricContext.newContextAwareGauge(ServiceMetricNames.FLOW_ORCHESTRATION_DELAY,
             () -> orchestrationDelay.get());
+        this.allSuccessfulMeter = metricContext.contextAwareMeter(

Review comment:
       See 
https://javadoc.io/doc/io.dropwizard.metrics/metrics-core/3.2.1/com/codahale/metrics/Meter.html
   
   It can return an "exponentially-weighted moving average rate" of the past 
5/10/15 minutes. Which is not exactly a "number of flows", but instead each 
time an event occurs there is an increase in the meter, then it gradually drops 
to 0 over the course of the window.
   
   I thought it makes more sense to use the existing meter concept rather than 
try to make our own implementation of a meter by having a gauge that we reset 
ourselves. And if we count number of flows as we thought of before, I think it 
is confusing (like if you see the number 20 on a graph, does that mean 20 
failures in the past 5 minutes from that point? 20 failures in a fixed 5 minute 
interval?).
   
   With this I think we can still look for spikes in failures on the graph, or 
look at the ratio of success to failure meters to measure the health of the 
system.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 602694)
    Time Spent: 1h 10m  (was: 1h)

> Add meters for successful/failed dags in total and by flowGroup
> ---------------------------------------------------------------
>
>                 Key: GOBBLIN-1452
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1452
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Jack Moseley
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (GOBBLIN-1452) Add meters for successful/failed dags in total and by flowGroup

Reply via email to