[
https://issues.apache.org/jira/browse/GOBBLIN-1634?focusedWorklogId=763192&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763192
]
ASF GitHub Bot logged work on GOBBLIN-1634:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 23:04
Start Date: 27/Apr/22 23:04
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3495:
URL: https://github.com/apache/gobblin/pull/3495#discussion_r860307138
##########
gobblin-metrics-libs/gobblin-metrics-base/src/main/java/org/apache/gobblin/metrics/event/TimingEvent.java:
##########
@@ -72,6 +72,8 @@ public static class FlowTimings {
public static final String FLOW_FAILED = "FlowFailed";
public static final String FLOW_RUNNING = "FlowRunning";
public static final String FLOW_CANCELLED = "FlowCancelled";
+ public static final String FLOW_SLA_KILLED = "FlowSLAKilled";
+ public static final String FLOW_START_SLA_KILLED = "FlowStartSLAKilled";
Review Comment:
I was thinking about adding new ExecutionStatuses 😅 but I think there could
be more confusion with having too many output states for users if they don't
particularly care about the cause. Also it would change the monitoring
platforms overall so I wanted to avoid that if possible.
Issue Time Tracking
-------------------
Worklog Id: (was: 763192)
Time Spent: 40m (was: 0.5h)
> GaaS Flow SLA Kills should be retryable if configured
> -----------------------------------------------------
>
> Key: GOBBLIN-1634
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1634
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: William Lo
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> On Gobblin as a Service flows can fail SLAs if they do not receive a Kafka
> event in some designated amount of time.
> Since GaaS supports retrys on failures, these failures due to SLAs should
> also be retryable.
> However, if the flow is cancelled from a user specified event through the API
> we do not want to retry.
> Additionally, we also do not want to retry if a flow is skipped due to
> concurrent jobs running at the same time, as it is unlikely without a more
> sophisticated waiting algorithm that the job will be finished by the time the
> job is retried again, wasting resources.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)