[
https://issues.apache.org/jira/browse/GOBBLIN-1662?focusedWorklogId=781843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781843
]
ASF GitHub Bot logged work on GOBBLIN-1662:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 15/Jun/22 21:25
Start Date: 15/Jun/22 21:25
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3520:
URL: https://github.com/apache/gobblin/pull/3520#discussion_r898445055
##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/orchestration/DagManagerTest.java:
##########
@@ -990,10 +990,19 @@ public void testQuotasRetryFlow() throws
URISyntaxException, IOException {
// Dag1 is running
this._dagManagerThread.run();
+ SortedMap<String, Counter> allCounters =
metricContext.getParent().get().getCounters();
+ Assert.assertEquals(allCounters.get(MetricRegistry.name(
+ ServiceMetricNames.GOBBLIN_SERVICE_PREFIX,
+ ServiceMetricNames.SERVICE_USERS,
+ "user")).getCount(), 1);
// Dag1 fails and is orchestrated again
this._dagManagerThread.run();
// Dag1 is running again
this._dagManagerThread.run();
+ Assert.assertEquals(allCounters.get(MetricRegistry.name(
+ ServiceMetricNames.GOBBLIN_SERVICE_PREFIX,
+ ServiceMetricNames.SERVICE_USERS,
+ "user")).getCount(), 1);
Review Comment:
There was only 1 job, and it fails and is retried via the
GobblinTrackingEvents. Before this change it would be marked as 2 because the
FAILED event is never propagated to the DagManager, it would be marked as
PENDING_RETRY here and would attempt another increment, thus making the count
2. This is an overcount so the test was to guard against that.
The comment refers to the internal quota, once line 1007 is run then the
quota should reset to 0, and offerring another dag would not run into the
exception.
Issue Time Tracking
-------------------
Worklog Id: (was: 781843)
Time Spent: 1h 10m (was: 1h)
> Retried flows emit double running counts
> ----------------------------------------
>
> Key: GOBBLIN-1662
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1662
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: William Lo
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> When flows are retried automatically, GaaS Dagmanager would perform
> submitJob() function again.
> The quotamanager itself checks that the retried job submission would not
> duplicate the quota increment, however this is not reflected in the metric
> itself, which will always increment if the quota check passes but does not
> guard against a duplicate increment due to retries.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)