[
https://issues.apache.org/jira/browse/GOBBLIN-1857?focusedWorklogId=871884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-871884
]
ASF GitHub Bot logged work on GOBBLIN-1857:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 19/Jul/23 22:40
Start Date: 19/Jul/23 22:40
Worklog Time Spent: 10m
Work Description: umustafi commented on code in PR #3719:
URL: https://github.com/apache/gobblin/pull/3719#discussion_r1268749194
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinClusterConfigurationKeys.java:
##########
@@ -180,6 +180,10 @@ public class GobblinClusterConfigurationKeys {
public static final String CANCEL_RUNNING_JOB_ON_DELETE =
GOBBLIN_CLUSTER_PREFIX + "job.cancelRunningJobOnDelete";
public static final String DEFAULT_CANCEL_RUNNING_JOB_ON_DELETE = "false";
+ // Job Execution ID for Helix jobs is inferred from Flow Execution IDs, but
there are scenarios in earlyStop jobs where
+ // this behavior needs to be avoided due to concurrent planning and acutal
jobs sharing the same execution ID
Review Comment:
small typo `actual`
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixJobsMapping.java:
##########
@@ -96,15 +96,17 @@ public HelixJobsMapping(Config sysConfig, URI fsUri, String
rootDir) {
}
public static String createPlanningJobId (Properties jobPlanningProps) {
+ long planningJobId = PropertiesUtils.getPropAsBoolean(jobPlanningProps,
GobblinClusterConfigurationKeys.USE_GENERATED_JOBEXECUTION_IDS, "false") ?
+ System.currentTimeMillis() :
PropertiesUtils.getPropAsLong(jobPlanningProps,
ConfigurationKeys.FLOW_EXECUTION_ID_KEY, System.currentTimeMillis());
Review Comment:
in non-earlyStop scenarios is there an advantage to re-using the flow
execution id for job id? Is this just for identifying the job more easily? In
the early stop case is there a log line or something we can use to track the
job id or how do we make the association?
Issue Time Tracking
-------------------
Worklog Id: (was: 871884)
Time Spent: 0.5h (was: 20m)
> Allow override key for job execution ID in Helix Gobblin Cluster
> ----------------------------------------------------------------
>
> Key: GOBBLIN-1857
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1857
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-cluster
> Reporter: William Lo
> Assignee: Hung Tran
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Job Execution ID is automatically inferred from the flow execution ID if the
> job was orchestrated by Gobblin-as-a-Service. However, this can lead to bugs
> in conjunction with other job keys such as earlyStop, since it would create
> multiple planningJobs and job IDs sent to Helix. These jobs would then have
> the same execution ID which is rejected by Helix.
> When those configurations are set, we want to force the Gobblin cluster to
> default to creating the job execution ID from its timestamp.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)