[ 
https://issues.apache.org/jira/browse/GOBBLIN-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Lo updated GOBBLIN-1865:
--------------------------------
    Description: 
With "gobblin.cluster.job.useGeneratedJobIds" configuration, jobs with that 
prefix should be using the system timestamp of Gobblin cluster instead of a 
provided flow execution ID.

Instead of this, it is more consistent to append flowExecutionId to a jobName 
then append a timestamp on top of that, so that all earlystop jobs relating to 
a flow execution can be tracked.

Now jobNames should have the following structure:
job_ActualJob<jobName>{_}<flowExecutionId>{_}<timestamp>

The timestamp is needed so that Helix can run concurrent jobs given a job ID.

  was:
With "gobblin.cluster.job.useGeneratedJobIds" configuration, jobs with that 
prefix should be using the system timestamp of Gobblin cluster instead of a 
provided flow execution ID.

Instead of this, it is more consistent to append flowExecutionId to a jobName 
then append a timestamp on top of that, so that all earlystop jobs relating to 
a flow execution can be tracked.

Now jobNames should have the following structure:
job_ActualJob<jobName>_<flowExecutionId>_<timestamp>


> Fix bug where overriding job execution ids cause issue with earlystop jobs 
> and job tracking
> -------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1865
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1865
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-cluster
>            Reporter: William Lo
>            Assignee: Hung Tran
>            Priority: Major
>
> With "gobblin.cluster.job.useGeneratedJobIds" configuration, jobs with that 
> prefix should be using the system timestamp of Gobblin cluster instead of a 
> provided flow execution ID.
> Instead of this, it is more consistent to append flowExecutionId to a jobName 
> then append a timestamp on top of that, so that all earlystop jobs relating 
> to a flow execution can be tracked.
> Now jobNames should have the following structure:
> job_ActualJob<jobName>{_}<flowExecutionId>{_}<timestamp>
> The timestamp is needed so that Helix can run concurrent jobs given a job ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to