[ 
https://issues.apache.org/jira/browse/GOBBLIN-1319?focusedWorklogId=513090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-513090
 ]

ASF GitHub Bot logged work on GOBBLIN-1319:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Nov/20 18:39
            Start Date: 17/Nov/20 18:39
    Worklog Time Spent: 10m 
      Work Description: yukuai518 commented on a change in pull request #3155:
URL: https://github.com/apache/incubator-gobblin/pull/3155#discussion_r525397700



##########
File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobScheduler.java
##########
@@ -367,6 +369,33 @@ public void 
handleDeleteJobConfigArrival(DeleteJobConfigArrivalEvent deleteJobAr
     }
   }
 
+  @Subscribe
+  public void handleCancelJobConfigArrival(CancelJobConfigArrivalEvent 
cancelJobArrival)
+      throws InterruptedException {
+    String jobUri = cancelJobArrival.getJoburi();
+    LOGGER.info("Received cancel for job configuration of job " + jobUri);
+    Optional<String> planningJob = Optional.empty();
+    Optional<String> actualJob = Optional.empty();
+
+    try {
+      planningJob = this.jobsMapping.getPlanningJobId(jobUri);
+      actualJob = this.jobsMapping.getActualJobId(jobUri);
+    } catch (IOException e) {
+      LOGGER.warn("Planning and actual jobs could not be retrieved for job 
{}", jobUri);
+      return;
+    }
+
+    if (planningJob.isPresent()) {
+      LOGGER.info("Cancelling planning job helix workflow: {}", 
planningJob.get());
+      new 
TaskDriver(this.taskDriverHelixManager.get()).waitToStop(planningJob.get(), 
this.helixJobStopTimeoutMillis);
+    }
+
+    if (actualJob.isPresent()) {
+      LOGGER.info("Cancelling actual job helix workflow: {}", actualJob.get());
+      new TaskDriver(this.jobHelixManager).waitToStop(actualJob.get(), 
this.helixJobStopTimeoutMillis);
+    }

Review comment:
       I have several questions:
   1. What is the difference between this logic vs cancelJobIfRequired() 
method, can we reuse some part of the logic?
   2. I don't know if it is safe to stop the actual job twice. When you cancel 
the planningJob, the GobblinHelixJobLauncher on the task driver node will 
receive a cancellation message and invoke its own executeCancellation() method, 
which will further cancel the actual job on the worker node. If we cancel the 
actual job directly here, do we have any conflict? Will there be any race 
condition needs some scrutify?
   3. Can you provide some comments on which method will be used by GaaS and 
which one will be the opposite since you mentioned you want a separate path?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 513090)
    Remaining Estimate: 0h
            Time Spent: 10m

> fix helix job cancellation in gobblin cluster
> ---------------------------------------------
>
>                 Key: GOBBLIN-1319
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1319
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Arjun Singh Bora
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to