[ 
https://issues.apache.org/jira/browse/GOBBLIN-1982?focusedWorklogId=898821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-898821
 ]

ASF GitHub Bot logged work on GOBBLIN-1982:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Jan/24 00:43
            Start Date: 10/Jan/24 00:43
    Worklog Time Spent: 10m 
      Work Description: umustafi opened a new pull request, #3854:
URL: https://github.com/apache/gobblin/pull/3854

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [X] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1982
   
   
   ### Description
   - [X] Here are some details about my PR, including screenshots (if 
applicable):
   The problem statement addressed in this issue is to determine a unique ID 
per execution that is agreed upon by all hosts, computed before returning any 
information back to user (about compilation or execution).
   
   Upon receiving the request for an `adhoc flow`, the recipient host creates a 
`flowExecutionId` when initializing `FlowSpec` from `config` for non-scheduled 
flows (see 
[code](https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#276)).
 This `flowExecutionId` is returned to the user for tracking the flow status. 
This should not change later on.
   
   Scheduled flows are fired upon each host at a different system clock time, 
so those ones need a consensus mechanism to coordinate between hosts. During 
`multiActiveLeaseArbitration` we update the `flowExecutionId` of a `DagAction` 
with an agreed upon value from the database to gain this consistency. However, 
this should only be done for scheduled flows before we any information 
externally about the `flowExecutionId` until later.
   
   To address the problems above we 
   1) skip `flowExecutionId` replacement for adhoc flows
   2) remove a flow compilation and `GTE` emission before the consensus on 
`flowExecutionId` is removed.
   There's no significant impact of removing this check. It will result in 
`dagActions` created for flows that may fail compilation later (after lease 
arbitration and before execution). Since we already compile the flow on 
accepting it, we are okay with a slight delay in failing a flow.
   
   ### Tests
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   The flowExecutionId replacement is tested by existing unit test 
`testAcquireLeaseSingleParticipant` and the new functionality is tested by 
`testSkipFlowExecutionIdReplacement`. 
   
   ### Commits
   - [X] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 898821)
    Remaining Estimate: 0h
            Time Spent: 10m

> Show a consistent flowExecutionId btwn Compilation & Execution 
> ---------------------------------------------------------------
>
>                 Key: GOBBLIN-1982
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1982
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The problem statement addressed in this issue is to determine a unique ID per 
> execution that is agreed upon by all hosts, computed before returning any 
> information back to user (about compilation or execution).
> Upon receiving the request for an adhoc flow, the recipient host creates a 
> flowExecutionId when initializing FlowSpec from config for non-scheduled 
> flows (see 
> [code|https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#276]).
>  This flowExecutionId is returned to the user for tracking the flow status. 
> This should not change later on.
> Scheduled flows are fired upon each host at a different system clock time, so 
> those ones need a consensus mechanism to coordinate between hosts. During 
> multiActiveLeaseArbitration we update the flowExecutionId of a DagAction with 
> an agreed upon value from the database to gain this consistency. However, 
> this should only be done for scheduled flows before we any information 
> externally about the flowExecutionId until later.
> To address the problems above we 
> 1) skip flowExecutionId replacement for adhoc flows
> 2) remove a flow compilation and GTE emission before the consensus on 
> flowExecutionId is removed.
> There's no significant impact of removing this check. It will result in 
> dagActions created for flows that may fail compilation later (after lease 
> arbitration and before execution). Since we already compile the flow on 
> accepting it, we are okay with a slight delay in failing a flow. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to