[
https://issues.apache.org/jira/browse/GOBBLIN-1692?focusedWorklogId=807173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-807173
]
ASF GitHub Bot logged work on GOBBLIN-1692:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 08/Sep/22 20:01
Start Date: 08/Sep/22 20:01
Worklog Time Spent: 10m
Work Description: homatthew commented on code in PR #3546:
URL: https://github.com/apache/gobblin/pull/3546#discussion_r966371603
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java:
##########
@@ -278,6 +278,7 @@ static void waitJobCompletion(HelixManager helixManager,
String workFlowName, St
case STOPPING:
log.info("Waiting for job {} to complete... State - {}", jobName,
jobState);
Thread.sleep(TimeUnit.SECONDS.toMillis(1L));
+ // TODO: fix the incorrect stoppingStateEndTime, and revisit
GOBBLIN-1692
Review Comment:
Ideally we should have a gobblin ticket for this TODO to better outline the
context of why we didn't do it now and when we should do it in the future
Issue Time Tracking
-------------------
Worklog Id: (was: 807173)
Time Spent: 20m (was: 10m)
> Make GobblinHelixJobScheduler stop Helix workflow asynchronously
> ----------------------------------------------------------------
>
> Key: GOBBLIN-1692
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1692
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-cluster
> Reporter: Hanghang Liu
> Assignee: Hung Tran
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When handleUpdateJobConfigArrival, a new job config gets posted,
> GobblinHelixJobScheduler will firstly stop and delete the old job, and try to
> spin up the updated helix workflow.
> The job scheduler will try to do the stop synchronically with a default 10
> seconds timeout setting. However, this stop constantly running longer than
> the timeout for Helix, causing the job state not correctly updated as
> stopped. Thus, when construct the GobblinHelixJobLauncher, we will have the
> previous job in a wrong state as jobRunningMap is not updated yet, causing
> the new job won’t being launched. So we always see this log: {{{}Job {} will
> not be executed because other jobs are still running{}}}.
> We can make the job delete asynchronized, and let waitForJobCompletion method
> to ensure the job status get updated correctly eventually.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)