[ 
https://issues.apache.org/jira/browse/GOBBLIN-1662?focusedWorklogId=780911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780911
 ]

ASF GitHub Bot logged work on GOBBLIN-1662:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jun/22 18:36
            Start Date: 13/Jun/22 18:36
    Worklog Time Spent: 10m 
      Work Description: arjun4084346 commented on code in PR #3520:
URL: https://github.com/apache/gobblin/pull/3520#discussion_r896021375


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java:
##########
@@ -963,7 +963,8 @@ private void submitJob(DagNode<JobExecutionPlan> dagNode) {
         // By this point the quota is allocated, so it's imperative to 
increment as missing would introduce the potential to decrement below zero upon 
quota release.
         // Quota release is guaranteed, despite failure, because exception 
handling within would mark the job FAILED.
         // When the ensuing kafka message spurs DagManager processing, the 
quota is released and the counts decremented
-        if (this.metricContext != null) {
+        // Ensure that we do not double increment for flows that are retried
+        if (this.metricContext != null && 
dagNode.getValue().getCurrentAttempts() == 1) {

Review Comment:
   Should we add this check while decrementing ? 
   where we have ```getRunningJobsCounter(dagNode).dec()```



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java:
##########
@@ -963,7 +963,8 @@ private void submitJob(DagNode<JobExecutionPlan> dagNode) {
         // By this point the quota is allocated, so it's imperative to 
increment as missing would introduce the potential to decrement below zero upon 
quota release.
         // Quota release is guaranteed, despite failure, because exception 
handling within would mark the job FAILED.
         // When the ensuing kafka message spurs DagManager processing, the 
quota is released and the counts decremented
-        if (this.metricContext != null) {
+        // Ensure that we do not double increment for flows that are retried
+        if (this.metricContext != null && 
dagNode.getValue().getCurrentAttempts() == 1) {

Review Comment:
   Should we add this check while decrementing also? 
   where we have ```getRunningJobsCounter(dagNode).dec()```





Issue Time Tracking
-------------------

    Worklog Id:     (was: 780911)
    Time Spent: 0.5h  (was: 20m)

> Retried flows emit double running counts
> ----------------------------------------
>
>                 Key: GOBBLIN-1662
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1662
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: William Lo
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When flows are retried automatically, GaaS Dagmanager would perform 
> submitJob() function again.
> The quotamanager itself checks that the retried job submission would not 
> duplicate the quota increment, however this is not reflected in the metric 
> itself, which will always increment if the quota check passes but does not 
> guard against a duplicate increment due to retries.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to