[ 
https://issues.apache.org/jira/browse/GOBBLIN-1857?focusedWorklogId=871884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-871884
 ]

ASF GitHub Bot logged work on GOBBLIN-1857:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Jul/23 22:40
            Start Date: 19/Jul/23 22:40
    Worklog Time Spent: 10m 
      Work Description: umustafi commented on code in PR #3719:
URL: https://github.com/apache/gobblin/pull/3719#discussion_r1268749194


##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinClusterConfigurationKeys.java:
##########
@@ -180,6 +180,10 @@ public class GobblinClusterConfigurationKeys {
   public static final String CANCEL_RUNNING_JOB_ON_DELETE = 
GOBBLIN_CLUSTER_PREFIX + "job.cancelRunningJobOnDelete";
   public static final String DEFAULT_CANCEL_RUNNING_JOB_ON_DELETE = "false";
 
+  // Job Execution ID for Helix jobs is inferred from Flow Execution IDs, but 
there are scenarios in earlyStop jobs where
+  // this behavior needs to be avoided due to concurrent planning and acutal 
jobs sharing the same execution ID

Review Comment:
   small typo `actual`



##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixJobsMapping.java:
##########
@@ -96,15 +96,17 @@ public HelixJobsMapping(Config sysConfig, URI fsUri, String 
rootDir) {
   }
 
   public static String createPlanningJobId (Properties jobPlanningProps) {
+    long planningJobId = PropertiesUtils.getPropAsBoolean(jobPlanningProps, 
GobblinClusterConfigurationKeys.USE_GENERATED_JOBEXECUTION_IDS, "false") ?
+        System.currentTimeMillis() : 
PropertiesUtils.getPropAsLong(jobPlanningProps, 
ConfigurationKeys.FLOW_EXECUTION_ID_KEY, System.currentTimeMillis());

Review Comment:
   in non-earlyStop scenarios is there an advantage to re-using the flow 
execution id for job id? Is this just for identifying the job more easily? In 
the early stop case is there a log line or something we can use to track the 
job id or how do we make the association?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 871884)
    Time Spent: 0.5h  (was: 20m)

> Allow override key for job execution ID in Helix Gobblin Cluster
> ----------------------------------------------------------------
>
>                 Key: GOBBLIN-1857
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1857
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-cluster
>            Reporter: William Lo
>            Assignee: Hung Tran
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Job Execution ID is automatically inferred from the flow execution ID if the 
> job was orchestrated by Gobblin-as-a-Service. However, this can lead to bugs 
> in conjunction with other job keys such as earlyStop, since it would create 
> multiple planningJobs and job IDs sent to Helix. These jobs would then have 
> the same execution ID which is rejected by Helix.
> When those configurations are set, we want to force the Gobblin cluster to 
> default to creating the job execution ID from its timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to