[ 
https://issues.apache.org/jira/browse/GOBBLIN-847?focusedWorklogId=292349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292349
 ]

ASF GitHub Bot logged work on GOBBLIN-847:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Aug/19 23:58
            Start Date: 09/Aug/19 23:58
    Worklog Time Spent: 10m 
      Work Description: arjun4084346 commented on pull request #2702: 
[GOBBLIN-847] Flow level sla
URL: https://github.com/apache/incubator-gobblin/pull/2702#discussion_r312679070
 
 

 ##########
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
 ##########
 @@ -530,6 +529,44 @@ private void pollAndAdvanceDag()
       }
     }
 
+    private ExecutionStatus getJobExcecutionStatus(boolean slaKilled, 
JobStatus jobStatus) {
+      if (slaKilled) {
+        return CANCELLED;
+      } else {
+        if (jobStatus == null) {
+          return PENDING;
+        } else {
+          return valueOf(jobStatus.getEventName());
+        }
+      }
+    }
+
+    /**
+     * Check if the SLA is configured for this job. If it is, tries to cancel 
the job if SLA is reached.
+     * @param node dag node of the job
+     * @return true if the job is killed because it reached sla
+     * @throws ExecutionException exception
+     * @throws InterruptedException exception
+     */
+    private boolean slaKillIfNeeded(DagNode<JobExecutionPlan> node) throws 
ExecutionException, InterruptedException {
+      long flowStartTime = DagManagerUtils.getFlowStartTime(node);
+      long currentTime = System.currentTimeMillis();
+      long flowSla = DagManagerUtils.getFlowSla(node);
+
+      if (flowSla != -1L && currentTime > flowStartTime + flowSla) {
+        log.info("Job exceeded the SLA of {} ms. Killing it now...", flowSla);
+        cancelDag(DagManagerUtils.generateDagId(node));
+        if (this.eventSubmitter.isPresent()) {
+          JobExecutionPlan jobExecutionPlan = 
DagManagerUtils.getJobExecutionPlan(node);
+          Map<String, String> jobMetadata = 
TimingEventUtils.getJobMetadata(Maps.newHashMap(), jobExecutionPlan);
+          
this.eventSubmitter.get().getTimingEvent(TimingEvent.LauncherTimings.JOB_CANCEL).stop(jobMetadata);
 
 Review comment:
   Fixed it, cancelling only one node not the whole dag
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 292349)
    Time Spent: 1h 40m  (was: 1.5h)

> add a flow level sla in gaas flows
> ----------------------------------
>
>                 Key: GOBBLIN-847
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-847
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Arjun Singh Bora
>            Priority: Major
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> add a flow level sla in gaas flows, because sometimes azkaban jobs may not 
> start and hence send any tracking event, or azkaban maybe down. in all those 
> cases, we might have to kill the job so we can start a new job



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to