[
https://issues.apache.org/jira/browse/GOBBLIN-2011?focusedWorklogId=907874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-907874
]
ASF GitHub Bot logged work on GOBBLIN-2011:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Mar/24 00:24
Start Date: 02/Mar/24 00:24
Worklog Time Spent: 10m
Work Description: Will-Lo commented on PR #3888:
URL: https://github.com/apache/gobblin/pull/3888#issuecomment-1974127567
@arjun4084346 I agree that we should emit a new event SKIPPED but we will
need to make sure that this is compatible with all clients, which can cause
issues if they are set to expect only certain outputs.
Issue Time Tracking
-------------------
Worklog Id: (was: 907874)
Time Spent: 50m (was: 40m)
> Fix bug where concurrent flows can be kicked off depending on a jobstatus
> race condition
> ----------------------------------------------------------------------------------------
>
> Key: GOBBLIN-2011
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2011
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: William Lo
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> There's a bug that causes GaaS multileader to kick off unintended concurrent
> flows which happens in the order described below:
> 1. Host A checks the latest flow execution status to ensure the prior flow is
> not running, sees that the prior execution is still running.
> 2. Host A fails the flow pending execution as it cannot run concurrent flow,
> this emits a FAILED event to GaaS which is ingested by the JobStatusMonitor.
> 3. Host B checks the latest flow execution status, sees the current flow
> execution ID which is FAILED (considered a finished flow).
> 4. Host B kicks off the pending flow execution when it shouldn't be.
> To resolve this, we need to ensure that we are looking at the past 2 flow
> executions, and follow the behavior:
> 1. If there is no prior execution, kick off the pending flow
> 2. If the prior execution is IN PROGRESS, we want to indicate that there is a
> concurrent flow and block the pending execution.
> 3. If the prior execution is FINISHED, then we want to kick off the pending
> execution (rely on the DagManager for deduplication of flows because we do
> not know if the host managing this pending flow is running behind the other
> hosts).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)