[ 
https://issues.apache.org/jira/browse/GOBBLIN-2022?focusedWorklogId=911734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-911734
 ]

ASF GitHub Bot logged work on GOBBLIN-2022:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Mar/24 06:35
            Start Date: 27/Mar/24 06:35
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3896:
URL: https://github.com/apache/gobblin/pull/3896#discussion_r1540522168


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProc.java:
##########
@@ -35,38 +36,39 @@
 
 /**
  * Responsible for performing the actual work for a given {@link DagTask} by 
first initializing its state, performing
- * actions based on the type of {@link DagTask} and finally submitting an 
event to the executor.
+ * actions based on the type of {@link DagTask}. Submitting events in time is 
important (PR#3641), hence initialize and

Review Comment:
   unless there's javadoc syntax I'm not familiar with, I'd suggest an actual 
link here - `https://github.com/apache/gobblin/pull/3896`



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProc.java:
##########
@@ -35,38 +36,39 @@
 
 /**
  * Responsible for performing the actual work for a given {@link DagTask} by 
first initializing its state, performing
- * actions based on the type of {@link DagTask} and finally submitting an 
event to the executor.
+ * actions based on the type of {@link DagTask}. Submitting events in time is 
important (PR#3641), hence initialize and
+ * act methods submit events as they happen.
  */
 @Alpha
 @Slf4j
 @RequiredArgsConstructor
-public abstract class DagProc<S, T> {
+public abstract class DagProc<S> {

Review Comment:
   looks reasonable to eliminate `T` param, although would now be customary to 
rename `S` to `T`.  please also describe the meaning of the generic type in the 
javadoc



##########
gobblin-service/src/test/java/org/apache/gobblin/service/monitoring/FsJobStatusRetrieverTest.java:
##########
@@ -51,6 +51,7 @@ public void setUp() throws Exception {
         ConfigValueFactory.fromAnyRef(stateStoreDir));
     this.jobStatusRetriever = new FsJobStatusRetriever(config, 
mock(MultiContextIssueRepository.class));
     this.mysqlDagActionStore = mock(MysqlDagActionStore.class);
+    // job status monitor stores dag action into dag action store, so need to 
mock that behavior to avoid NPE

Review Comment:
   I did initially suggest adding such a comment, but as I just mentioned, my 
opinion has evolved.  the need for mocking here seems a code smell that 
`KJSM::addJobStatusToStateStore` is doing too much and ought to be reworked.  
following that, no mocking and no comment would be required



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/MostlyMySqlDagManagementStateStore.java:
##########
@@ -244,11 +251,12 @@ public boolean releaseQuota(Dag.DagNode<JobExecutionPlan> 
dagNode) throws IOExce
   }
 
   @Override
-  public Optional<JobStatus> getJobStatus(String flowGroup, String flowName, 
long flowExecutionId, String jobGroup, String jobName) {
-    Iterator<JobStatus> jobStatusIterator =
-        jobStatusRetriever.getJobStatusesForFlowExecution(flowName, flowGroup, 
flowExecutionId, jobName, jobGroup);
+  public Optional<JobStatus> getJobStatus(DagNodeId dagNodeId) {
+    Iterator<JobStatus> jobStatusIterator = 
this.jobStatusRetriever.getJobStatusesForFlowExecution(dagNodeId.getFlowName(),
+        dagNodeId.getFlowGroup(), dagNodeId.getFlowExecutionId(), 
dagNodeId.getJobName(), dagNodeId.getJobGroup());
 
     if (jobStatusIterator.hasNext()) {
+      // there must exist exactly one job status for a dag node id

Review Comment:
   may be worth adding either an assert or at least a `log.error` if this 
presumption is violated.  what do you think?



##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java:
##########
@@ -193,7 +194,7 @@ protected void 
processMessage(DecodeableKafkaRecord<byte[],byte[]> message) {
         org.apache.gobblin.configuration.State jobStatus = 
parseJobStatus(gobblinTrackingEvent);
         if (jobStatus != null) {
           try (Timer.Context context = 
getMetricContext().timer(GET_AND_SET_JOB_STATUS).time()) {
-            addJobStatusToStateStore(jobStatus, this.stateStore, 
this.eventProducer, this.dagActionStore);
+            addJobStatusToStateStore(jobStatus, this.stateStore, 
this.eventProducer, this.dagActionStore, this.dagProcEngineEnabled);

Review Comment:
   rather than pass one more param to this `static` that exists to be 
`@VisibleForTesting`, I suggested to pass two fewer params, 
`this.eventProducer` and `this.dagActionStore`, but to have it return 
`Optional<JobStatus>` only when the job state went to final.  this would leave 
the `static` much more cohesive (and testable!) by solely updating the job 
state store, without leaving side-effects anywhere else



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProc.java:
##########
@@ -35,38 +36,39 @@
 
 /**
  * Responsible for performing the actual work for a given {@link DagTask} by 
first initializing its state, performing
- * actions based on the type of {@link DagTask} and finally submitting an 
event to the executor.
+ * actions based on the type of {@link DagTask}. Submitting events in time is 
important (PR#3641), hence initialize and
+ * act methods submit events as they happen.
  */
 @Alpha
 @Slf4j
 @RequiredArgsConstructor
-public abstract class DagProc<S, T> {
+public abstract class DagProc<S> {
   protected final DagTask dagTask;
+  @Getter protected final DagManager.DagId dagId;
+  @Getter protected final DagNodeId dagNodeId;
   protected static final MetricContext metricContext = 
Instrumented.getMetricContext(new State(), DagProc.class);
   protected static final EventSubmitter eventSubmitter = new 
EventSubmitter.Builder(
       metricContext, "org.apache.gobblin.service").build();
 
-  public final void process(DagManagementStateStore dagManagementStateStore) 
throws IOException {
-    S state = initialize(dagManagementStateStore);   // todo - retry
-    T result = act(dagManagementStateStore, state);   // todo - retry
-    commit(dagManagementStateStore, result);   // todo - retry
-    log.info("{} successfully concluded actions for dagId : {}", 
getClass().getSimpleName(), getDagId());
-  }
-
-  protected DagManager.DagId getDagId() {
-    return this.dagTask.getDagId();
+  public DagProc(DagTask dagTask) {
+    this.dagTask = dagTask;
+    this.dagId = this.dagTask.getDagId();
+    this.dagNodeId = this.dagTask.getDagNodeId();
   }
 
-  protected DagNodeId getDagNodeId() {
-    return this.dagTask.getDagNodeId();
+  public final void process(DagManagementStateStore dagManagementStateStore) 
throws IOException {
+    S state = initialize(dagManagementStateStore);   // todo - retry
+    act(dagManagementStateStore, state);   // todo - retry
+    commit(dagManagementStateStore);   // todo - retry
+    log.info("{} successfully concluded actions for dagId : {}", 
getClass().getSimpleName(), this.dagId);

Review Comment:
   I don't believe we need `commit`.  do you see a purpose for it? 
   
   nit: "...concluded *processing* for dagId..."





Issue Time Tracking
-------------------

    Worklog Id:     (was: 911734)
    Time Spent: 10h 10m  (was: 10h)

> create dag proc for taking actions on job completion
> ----------------------------------------------------
>
>                 Key: GOBBLIN-2022
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2022
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Arjun Singh Bora
>            Priority: Major
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to