[
https://issues.apache.org/jira/browse/GOBBLIN-1702?focusedWorklogId=807225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-807225
]
ASF GitHub Bot logged work on GOBBLIN-1702:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 08/Sep/22 23:53
Start Date: 08/Sep/22 23:53
Worklog Time Spent: 10m
Work Description: ZihanLi58 commented on code in PR #3556:
URL: https://github.com/apache/gobblin/pull/3556#discussion_r966508754
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java:
##########
@@ -278,13 +278,16 @@ static void waitJobCompletion(HelixManager helixManager,
String workFlowName, St
case STOPPING:
log.info("Waiting for job {} to complete... State - {}", jobName,
jobState);
Thread.sleep(TimeUnit.SECONDS.toMillis(1L));
+ if (stoppingStateEndTime == null) {
+ stoppingStateEndTime = jobStartTimeMillis +
stoppingStateTimeoutInSeconds * 1000;
Review Comment:
why start from jobStartTimeMills? Shouldn't it be currentTimeStamp?
Basically it should be the first time we see job in stopping state plus the
stopping state timeout.
##########
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java:
##########
@@ -278,13 +278,16 @@ static void waitJobCompletion(HelixManager helixManager,
String workFlowName, St
case STOPPING:
log.info("Waiting for job {} to complete... State - {}", jobName,
jobState);
Thread.sleep(TimeUnit.SECONDS.toMillis(1L));
+ if (stoppingStateEndTime == null) {
+ stoppingStateEndTime = jobStartTimeMillis +
stoppingStateTimeoutInSeconds * 1000;
Review Comment:
Also add one log here indicate we see job in stopping state so that you will
be able to test whether your fix works?
Issue Time Tracking
-------------------
Worklog Id: (was: 807225)
Time Spent: 1h 10m (was: 1h)
> Fix Bug when wait and checking helix job state till completion
> --------------------------------------------------------------
>
> Key: GOBBLIN-1702
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1702
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-cluster
> Reporter: Hanghang Liu
> Assignee: Hung Tran
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Currently the HelixUtils.waitJobCompletion() has a bug when hob in STOPPING
> state, it immediately try to delete it, instead of waiting the job itself to
> transit to STOPPED state, due to the stoppingStateEndTime is not set
> correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)