[ 
https://issues.apache.org/jira/browse/GOBBLIN-2011?focusedWorklogId=907872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-907872
 ]

ASF GitHub Bot logged work on GOBBLIN-2011:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Mar/24 00:15
            Start Date: 02/Mar/24 00:15
    Worklog Time Spent: 10m 
      Work Description: arjun4084346 commented on PR #3888:
URL: https://github.com/apache/gobblin/pull/3888#issuecomment-1974119694

   I think we should not set the status "Failed" when the last execution is 
running. We should instead emit a new event "SKIPPED". With this any further 
execution should be able to correctly decide whether a new job should run or 
not.
   This will make the code more maintainable.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 907872)
    Time Spent: 40m  (was: 0.5h)

> Fix bug where concurrent flows can be kicked off depending on a jobstatus 
> race condition
> ----------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-2011
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2011
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: William Lo
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's a bug that causes GaaS multileader to kick off unintended concurrent 
> flows which happens in the order described below:
> 1. Host A checks the latest flow execution status to ensure the prior flow is 
> not running, sees that the prior execution is still running.
> 2. Host A fails the flow pending execution as it cannot run concurrent flow, 
> this emits a FAILED event to GaaS which is ingested by the JobStatusMonitor.
> 3. Host B checks the latest flow execution status, sees the current flow 
> execution ID which is FAILED (considered a finished flow).
> 4. Host B kicks off the pending flow execution when it shouldn't be.
> To resolve this, we need to ensure that we are looking at the past 2 flow 
> executions, and follow the behavior:
> 1. If there is no prior execution, kick off the pending flow
> 2. If the prior execution is IN PROGRESS, we want to indicate that there is a 
> concurrent flow and block the pending execution.
> 3. If the prior execution is FINISHED, then we want to kick off the pending 
> execution (rely on the DagManager for deduplication of flows because we do 
> not know if the host managing this pending flow is running behind the other 
> hosts).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to