hanghangliu commented on PR #3546: URL: https://github.com/apache/gobblin/pull/3546#issuecomment-1239958939
To summarize what this PR is trying to address: when update job event received, the GobblinHelixJobScheduler tries to stop the old one and then launch the new one. When stop the old one, we used to have a sync call of waitToStop through Helix. [HelixUtils.waitJobCompletion](https://github.com/apache/gobblin/blob/8c9c8a84ed23c0215c4d80125ac532e97085d76f/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java#L278) then detect the job state changed to stopping, then it immediately delete the job, which causing waitToStop always throw exception. Change the waitToStop to a async call can avoid the exception and we'll realize the job is completed by checking the jobRunningMap, which shall be updated in the JobLauncher. To fix the [HelixUtils.waitJobCompletion](https://github.com/apache/gobblin/blob/8c9c8a84ed23c0215c4d80125ac532e97085d76f/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixUtils.java#L278) incorrect deletion timing, we'll have a separate PR to address. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
