[
https://issues.apache.org/jira/browse/GOBBLIN-1452?focusedWorklogId=602694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602694
]
ASF GitHub Bot logged work on GOBBLIN-1452:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 27/May/21 01:30
Start Date: 27/May/21 01:30
Worklog Time Spent: 10m
Work Description: jack-moseley commented on a change in pull request
#3290:
URL: https://github.com/apache/gobblin/pull/3290#discussion_r640226670
##########
File path:
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
##########
@@ -460,6 +463,10 @@ public synchronized void setActive(boolean active) {
this.jobStatusPolledTimer =
Optional.of(this.metricContext.timer(ServiceMetricNames.JOB_STATUS_POLLED_TIMER));
ContextAwareGauge<Long> orchestrationDelayMetric =
metricContext.newContextAwareGauge(ServiceMetricNames.FLOW_ORCHESTRATION_DELAY,
() -> orchestrationDelay.get());
+ this.allSuccessfulMeter = metricContext.contextAwareMeter(
Review comment:
See
https://javadoc.io/doc/io.dropwizard.metrics/metrics-core/3.2.1/com/codahale/metrics/Meter.html
It can return an "exponentially-weighted moving average rate" of the past
5/10/15 minutes. Which is not exactly a "number of flows", but instead each
time an event occurs there is an increase in the meter, then it gradually drops
to 0 over the course of the window.
I thought it makes more sense to use the existing meter concept rather than
try to make our own implementation of a meter by having a gauge that we reset
ourselves. And if we count number of flows as we thought of before, I think it
is confusing (like if you see the number 20 on a graph, does that mean 20
failures in the past 5 minutes from that point? 20 failures in a fixed 5 minute
interval?).
With this I think we can still look for spikes in failures on the graph, or
look at the ratio of success to failure meters to measure the health of the
system.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 602694)
Time Spent: 1h 10m (was: 1h)
> Add meters for successful/failed dags in total and by flowGroup
> ---------------------------------------------------------------
>
> Key: GOBBLIN-1452
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1452
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Jack Moseley
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)