Re: [PR] HDDS-11509. logging improvements for directory deletion [ozone]

via GitHub Fri, 15 Nov 2024 09:01:40 -0800


errose28 commented on code in PR #7314:
URL: https://github.com/apache/ozone/pull/7314#discussion_r1844180851



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/AbstractKeyDeletingService.java:
##########
@@ -449,13 +453,21 @@ public long optimizeDirDeletesAndSubmitRequest(long 
remainNum,
       deletedDirsCount.addAndGet(dirNum + subdirDelNum);
       movedDirsCount.addAndGet(subDirNum - subdirDelNum);
       movedFilesCount.addAndGet(subFileNum);
+      long timeTakenInIteration = Time.monotonicNow() - startTime;
       LOG.info("Number of dirs deleted: {}, Number of sub-dir " +
               "deleted: {}, Number of sub-files moved:" +
               " {} to DeletedTable, Number of sub-dirs moved {} to " +
-              "DeletedDirectoryTable, iteration elapsed: {}ms," +
+              "DeletedDirectoryTable, limit per iteration: {}, iteration 
elapsed: {}ms, " +
               " totalRunCount: {}",
-          dirNum, subdirDelNum, subFileNum, (subDirNum - subdirDelNum),
-          Time.monotonicNow() - startTime, getRunCount());
+          dirNum, subdirDelNum, subFileNum, (subDirNum - subdirDelNum), limit,
+          timeTakenInIteration, getRunCount());
+      if (remainNum <= 0) {
+        LOG.warn("Limit for no. of directory+files that can be deleted in one 
iteration is reached. " +
+                "Current limit: {} = {}",
+            OMConfigKeys.OZONE_PATH_DELETING_LIMIT_PER_TASK, 
ozoneManager.getConfiguration()
+                .getInt(OMConfigKeys.OZONE_PATH_DELETING_LIMIT_PER_TASK,
+                    OMConfigKeys.OZONE_PATH_DELETING_LIMIT_PER_TASK_DEFAULT));
+      }

Review Comment:
   I guess the Jira description doesn't currently specify what do log when the 
deletion limits are hit:
   > At the end of each run, and INFO log should display how many items were 
processed, what the current limit of items per run is, and how long it took.
   If the run time of the iteration exceeded the configured interval, the 
message should highlight this and be elevated to WARN.
   This should also log the config keys that can remedy this for batch size, 
interval, or number of threads.
   
   However based on past experiences troubleshooting slow deletion issues in a 
busy cluster with many other events going on I would be in favor of also 
elevating this to warn if the limits are being hit to indicate that the system 
is under load and deletion may be under-provisioned. However from the Jira this 
should be one single message as a summary with all this information at the 
corresponding log level based on its content.
   
   Also worth noting that once the multi-threaded implementation is complete it 
should take a lot more to hit the deletion limit, and this message will not 
seem so spurious if 10 threads are being maxed out trying to process current 
deletion. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-11509. logging improvements for directory deletion [ozone]

Reply via email to