[ 
https://issues.apache.org/jira/browse/GOBBLIN-1800?focusedWorklogId=851006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-851006
 ]

ASF GitHub Bot logged work on GOBBLIN-1800:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Mar/23 23:14
            Start Date: 14/Mar/23 23:14
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3661:
URL: https://github.com/apache/gobblin/pull/3661#discussion_r1136337532


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java:
##########
@@ -687,7 +687,6 @@ private void cancelDagNode(DagNode<JobExecutionPlan> 
dagNodeToCancel) throws Exe
         Future future = dagNodeToCancel.getValue().getJobFuture().get();
         String serializedFuture = 
DagManagerUtils.getSpecProducer(dagNodeToCancel).serializeAddSpecResponse(future);
         props.put(ConfigurationKeys.SPEC_PRODUCER_SERIALIZED_FUTURE, 
serializedFuture);
-        sendCancellationEvent(dagNodeToCancel.getValue());

Review Comment:
   crap good catch, I didn't realize it was also used for manual kill requests 
as well. Will add a parameter to not send the event on SLA kill and let the 
pollAndAdvanceDag() make the decision there





Issue Time Tracking
-------------------

    Worklog Id:     (was: 851006)
    Time Spent: 40m  (was: 0.5h)

> GaaS does not retry SLA killed jobs
> -----------------------------------
>
>                 Key: GOBBLIN-1800
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1800
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Gobblin-as-a-Service fails jobs when they run past their start SLA and their 
> runtime SLA. It would be expected for jobs to have these SLAs retried if 
> configured to retry, but they currently do not.
> The DagManager should automatically retry jobs that exceed their SLAs if the 
> user configured retries, in case these flow failures are due to intermittent 
> issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to