[
https://issues.apache.org/jira/browse/GOBBLIN-1702?focusedWorklogId=807214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-807214
]
ASF GitHub Bot logged work on GOBBLIN-1702:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 08/Sep/22 23:06
Start Date: 08/Sep/22 23:06
Worklog Time Spent: 10m
Work Description: homatthew commented on code in PR #3556:
URL: https://github.com/apache/gobblin/pull/3556#discussion_r966489907
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java:
##########
@@ -278,9 +278,12 @@ static void waitJobCompletion(HelixManager helixManager,
String workFlowName, St
case STOPPING:
log.info("Waiting for job {} to complete... State - {}", jobName,
jobState);
Thread.sleep(TimeUnit.SECONDS.toMillis(1L));
+ if (stoppingStateEndTime == 0) {
+ stoppingStateEndTime = currentTimeMillis +
stoppingStateTimeoutInSeconds * 1000;
+ }
// Workaround for a Helix bug where a job may be stuck in the
STOPPING state due to an unresponsive task.
- if (System.currentTimeMillis() > stoppingStateEndTime) {
- log.info("Deleting workflow {}", workFlowName);
+ if (stoppingStateEndTime != 0 && System.currentTimeMillis() >
stoppingStateEndTime) {
Review Comment:
Nit: won't happen b.c. of line 281
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java:
##########
@@ -260,7 +260,7 @@ static void waitJobCompletion(HelixManager helixManager,
String workFlowName, St
endTime = currentTimeMillis + timeoutInSeconds.get() * 1000;
Review Comment:
`currentTimeMillis` should be changed to start time because it actually
denote the start of the job and not the curren time. If we need current time
just use system time
Issue Time Tracking
-------------------
Worklog Id: (was: 807214)
Time Spent: 0.5h (was: 20m)
> Fix Bug when wait and checking helix job state till completion
> --------------------------------------------------------------
>
> Key: GOBBLIN-1702
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1702
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-cluster
> Reporter: Hanghang Liu
> Assignee: Hung Tran
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently the HelixUtils.waitJobCompletion() has a bug when hob in STOPPING
> state, it immediately try to delete it, instead of waiting the job itself to
> transit to STOPPED state, due to the stoppingStateEndTime is not set
> correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)