[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4490: HDDS-7951. [Snapshot] Clean SnapDiff job and report table

via GitHub Wed, 12 Apr 2023 14:17:33 -0700


hemantk-12 commented on code in PR #4490:
URL: https://github.com/apache/ozone/pull/4490#discussion_r1164659023



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -167,8 +158,19 @@ public SnapshotDiffManager(ManagedRocksDB db,
         new ThreadPoolExecutor.CallerRunsPolicy()
     );
 
-    // TODO: [SNAPSHOT] Load jobs only if it is leader node.
-    //  It could a event-triggered form OM when node is leader and up.
+    // Ideally, loadJobsOnStartUp should run only on OM node, since SnapDiff
+    // is not HA currently and running this on all the nodes would be
+    // inefficient. Especially, when OM node restarts and loses its leadership.
+    // However, it is hard to determine if node is leader node because 
consensus
+    // happens inside Ratis. We can add something like Awaitility.wait() here
+    // but that is not full proof either.
+    // Hence, we decided that it is OK to let this run on all the OM nodes for
+    // now knowing that it would be inefficient.
+    // When SnapshotDiffManager loads for very first time, loadJobsOnStartUp
+    // will be no-ops for all the nodes. In subsequent restarts or upgrades,
+    // it would run on the current leader and most like on previous leader 
only.
+    // When we build snapDiff HA aware, we will revisit this.
+    // Details: 
https://github.com/apache/ozone/pull/4438#discussion_r1149788226
     this.loadJobsOnStartUp();

Review Comment:
   It would be because `SnapshotDiffManager`'s object will be created on OM 
initialization.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -838,6 +835,18 @@ private synchronized void updateJobStatus(String jobKey,
     snapDiffJobTable.put(jobKey, snapshotDiffJob);
   }
 
+  private synchronized void updateJobStatusToDone(String jobKey,
+                                                  long totalNumberOfEntries) {
+    SnapshotDiffJob snapshotDiffJob = snapDiffJobTable.get(jobKey);
+    if (snapshotDiffJob.getStatus() != IN_PROGRESS) {

Review Comment:
   It should not matter because `updateJobStatusToDone` is called and should be 
called only after we successfully finish the diff calculation and report 
generation. If order of updating job status is messed up, we could be in that 
situation.
   
   I created this function because we are updating job status along with total 
number of diff entries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4490: HDDS-7951. [Snapshot] Clean SnapDiff job and report table

Reply via email to