[
https://issues.apache.org/jira/browse/GOBBLIN-1662?focusedWorklogId=781292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781292
]
ASF GitHub Bot logged work on GOBBLIN-1662:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 14/Jun/22 15:34
Start Date: 14/Jun/22 15:34
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3520:
URL: https://github.com/apache/gobblin/pull/3520#discussion_r896983202
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java:
##########
@@ -963,7 +963,8 @@ private void submitJob(DagNode<JobExecutionPlan> dagNode) {
// By this point the quota is allocated, so it's imperative to
increment as missing would introduce the potential to decrement below zero upon
quota release.
// Quota release is guaranteed, despite failure, because exception
handling within would mark the job FAILED.
// When the ensuing kafka message spurs DagManager processing, the
quota is released and the counts decremented
- if (this.metricContext != null) {
+ // Ensure that we do not double increment for flows that are retried
+ if (this.metricContext != null &&
dagNode.getValue().getCurrentAttempts() == 1) {
Review Comment:
No, since decrementing can occur irrespective of the attempt number, as
it'll only decrement/hit an end state on the final attempt. If a job is retried
automatically, it won't show up as a failed job status and instead
PENDING_RETRY, and get resubmitted with `submitJob()` instead of
`onJobFinish()`, so it'll never go through the decrement count on retry.
https://github.com/apache/gobblin/blob/b726a606cea3deb567b1fdeeba9acbcc220e6d30/gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java#L269
Issue Time Tracking
-------------------
Worklog Id: (was: 781292)
Time Spent: 50m (was: 40m)
> Retried flows emit double running counts
> ----------------------------------------
>
> Key: GOBBLIN-1662
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1662
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: William Lo
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When flows are retried automatically, GaaS Dagmanager would perform
> submitJob() function again.
> The quotamanager itself checks that the retried job submission would not
> duplicate the quota increment, however this is not reflected in the metric
> itself, which will always increment if the quota check passes but does not
> guard against a duplicate increment due to retries.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)