William Lo created GOBBLIN-1784:
-----------------------------------
Summary: Race condition where on service restart DagManager will
lose track of dags
Key: GOBBLIN-1784
URL: https://issues.apache.org/jira/browse/GOBBLIN-1784
Project: Apache Gobblin
Issue Type: Bug
Components: gobblin-service
Reporter: William Lo
Assignee: Abhishek Tiwari
Gobblin-as-a-Service has a bug where on restart, the DagManager will clean up
dags but a flow event is never sent.
This leads to a scenario where if the event is never sent by the underlying
notification system, the dag will already be cleaned up and thus the job status
will permanently be stuck in a running state.
The DagManager thus should only clean up its own reference of dags after it
reads that the jobstatus monitor has properly saved the final flow status, and
if a status hasn't been received by some timestamp (e.g. 5 mins), then the
DagManager will re-emit the event in case it was lost.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)