[
https://issues.apache.org/jira/browse/GOBBLIN-1672?focusedWorklogId=797416&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797416
]
ASF GitHub Bot logged work on GOBBLIN-1672:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Aug/22 22:00
Start Date: 02/Aug/22 22:00
Worklog Time Spent: 10m
Work Description: arjun4084346 commented on code in PR #3532:
URL: https://github.com/apache/gobblin/pull/3532#discussion_r936061609
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java:
##########
@@ -1132,24 +1035,39 @@ private void cleanUp() {
DagNode<JobExecutionPlan> dagNode = dagNodeList.poll();
deleteJobState(dagId, dagNode);
}
- log.info("Dag {} has finished with status FAILED; Cleaning up dag from
the state store.", dagId);
- onFlowFailure(dagId);
+ Dag<JobExecutionPlan> dag = this.dags.get(dagId);
+ String status = TimingEvent.FlowTimings.FLOW_FAILED;
+ if
(TimingEvent.FlowTimings.FLOW_RUN_DEADLINE_EXCEEDED.equals(dag.getFlowEvent()))
{
+
this.dagManagerMetrics.emitFlowSlaExceededMetrics(DagManagerUtils.getFlowId(dag));
+ } else if
(!TimingEvent.FlowTimings.FLOW_START_DEADLINE_EXCEEDED.equals(dag.getFlowEvent()))
{
+
dagManagerMetrics.emitFlowFailedMetrics(DagManagerUtils.getFlowId(this.dags.get(dagId)));
+ }
+ addFailedDag(dagId);
+ log.info("Dag {} has finished with status {}; Cleaning up dag from the
state store.", dagId, status);
// send an event before cleaning up dag
- DagManagerUtils.emitFlowEvent(this.eventSubmitter,
this.dags.get(dagId), TimingEvent.FlowTimings.FLOW_FAILED);
+ DagManagerUtils.emitFlowEvent(this.eventSubmitter,
this.dags.get(dagId), status);
dagIdstoClean.add(dagId);
}
- //Clean up completed dags
- for (String dagId : this.dags.keySet()) {
+ // Remove dags that are finished and emit their appropriate metrics
+ for (Map.Entry<String, Dag<JobExecutionPlan>> dagIdKeyPair :
this.dags.entrySet()) {
+ String dagId = dagIdKeyPair.getKey();
+ Dag<JobExecutionPlan> dag = dagIdKeyPair.getValue();
if (!hasRunningJobs(dagId) &&
!this.failedDagIdsFinishRunning.contains(dagId)) {
String status = TimingEvent.FlowTimings.FLOW_SUCCEEDED;
if (this.failedDagIdsFinishAllPossible.contains(dagId)) {
- onFlowFailure(dagId);
+ if
(TimingEvent.FlowTimings.FLOW_RUN_DEADLINE_EXCEEDED.equals(dag.getFlowEvent()))
{
Review Comment:
Maybe, we can move this if block inside `addFailedDag` ?
Issue Time Tracking
-------------------
Worklog Id: (was: 797416)
Time Spent: 1h 40m (was: 1.5h)
> Refactor metrics in dagmanager and add per spec executor metrics
> ----------------------------------------------------------------
>
> Key: GOBBLIN-1672
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1672
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-service
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Add the following metrics:
> 1. Success per executor
> 2. Fail per executor
> 3. SLA killed per executor
> 4. SLA killed per flowgroup
> 5. SLA killed per user
> 6. SLA killed overall
--
This message was sent by Atlassian Jira
(v8.20.10#820010)