[ 
https://issues.apache.org/jira/browse/GOBBLIN-1662?focusedWorklogId=781843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781843
 ]

ASF GitHub Bot logged work on GOBBLIN-1662:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jun/22 21:25
            Start Date: 15/Jun/22 21:25
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3520:
URL: https://github.com/apache/gobblin/pull/3520#discussion_r898445055


##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/orchestration/DagManagerTest.java:
##########
@@ -990,10 +990,19 @@ public void testQuotasRetryFlow() throws 
URISyntaxException, IOException {
 
     // Dag1 is running
     this._dagManagerThread.run();
+    SortedMap<String, Counter> allCounters = 
metricContext.getParent().get().getCounters();
+    Assert.assertEquals(allCounters.get(MetricRegistry.name(
+        ServiceMetricNames.GOBBLIN_SERVICE_PREFIX,
+        ServiceMetricNames.SERVICE_USERS,
+        "user")).getCount(), 1);
     // Dag1 fails and is orchestrated again
     this._dagManagerThread.run();
     // Dag1 is running again
     this._dagManagerThread.run();
+    Assert.assertEquals(allCounters.get(MetricRegistry.name(
+        ServiceMetricNames.GOBBLIN_SERVICE_PREFIX,
+        ServiceMetricNames.SERVICE_USERS,
+        "user")).getCount(), 1);

Review Comment:
   There was only 1 job, and it fails and is retried via the 
GobblinTrackingEvents. Before this change it would be marked as 2 because the 
FAILED event is never propagated to the DagManager, it would be marked as 
PENDING_RETRY here and would attempt another increment, thus making the count 
2. This is an overcount so the test was to guard against that.
   
   The comment refers to the internal quota, once line 1007 is run then the 
quota should reset to 0, and offerring another dag would not run into the 
exception.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 781843)
    Time Spent: 1h 10m  (was: 1h)

> Retried flows emit double running counts
> ----------------------------------------
>
>                 Key: GOBBLIN-1662
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1662
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: William Lo
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When flows are retried automatically, GaaS Dagmanager would perform 
> submitJob() function again.
> The quotamanager itself checks that the retried job submission would not 
> duplicate the quota increment, however this is not reflected in the metric 
> itself, which will always increment if the quota check passes but does not 
> guard against a duplicate increment due to retries.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to