Chen Guo created GOBBLIN-998:
--------------------------------
Summary: ExecutionStatus should be reset to PENDING before a job
retries
Key: GOBBLIN-998
URL: https://issues.apache.org/jira/browse/GOBBLIN-998
Project: Apache Gobblin
Issue Type: Bug
Reporter: Chen Guo
In the modifyStateIfRetryRequired of KafkaJobStatusMonitor, when the state is
Failed and currentAttempts < maxAttempts, the ExecutionStatus is set to
Running.
However, due to the checkin from
GOBBLIN-974([https://github.com/apache/incubator-gobblin/blob/9f50a2563cc257039da44018663b6b9e119fb499/gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java#L159]),
the currentAttempts update from a lower-order event(like Orchestrated) cannot
be consumed to update the jobState file. Thus it will cause infinite retries in
DagManagerThread for failed jobs when it poolAndAdvanceDag.
The solution is to update ExecutionStatus to PENDING instead of Running.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)