sumitagrawl commented on code in PR #7349:
URL: https://github.com/apache/ozone/pull/7349#discussion_r1833721607


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java:
##########
@@ -169,27 +210,27 @@ public BackgroundTaskResult call() {
             = new ArrayList<>((int) remainNum);
 
         Table.KeyValue<String, OmKeyInfo> pendingDeletedDirInfo;
-
-        try (TableIterator<String, ? extends KeyValue<String, OmKeyInfo>>
-                 deleteTableIterator = getOzoneManager().getMetadataManager().
-            getDeletedDirTable().iterator()) {
-          // This is to avoid race condition b/w purge request and snapshot 
chain updation. For AOS taking the global
-          // snapshotId since AOS could process multiple buckets in one 
iteration.
+        // This is to avoid race condition b/w purge request and snapshot 
chain updation. For AOS taking the global
+        // snapshotId since AOS could process multiple buckets in one 
iteration.
+        try {
           UUID expectedPreviousSnapshotId =
-              
((OmMetadataManagerImpl)getOzoneManager().getMetadataManager()).getSnapshotChainManager()
+              ((OmMetadataManagerImpl) 
getOzoneManager().getMetadataManager()).getSnapshotChainManager()
                   .getLatestGlobalSnapshotId();
 
           long startTime = Time.monotonicNow();
-          while (remainNum > 0 && deleteTableIterator.hasNext()) {
-            pendingDeletedDirInfo = deleteTableIterator.next();
+          while (remainNum > 0) {
+            pendingDeletedDirInfo = deletedDirSupplier.get();

Review Comment:
   There is a concern of getting 10K batches, mostly it will be wasted as per 
logic,
   - Get parent
   -- scan subdir and subfile
   -- if subdir/subfile itself meet the limit, then it will not read remaining 
9.999k records in this iteration.
   
   Limit: ratis size OR limit being configured, and this limit is per thread.
   
   So its suggested not to have cache. Additionally, multiple iterator have 
concern of duplicate reading or managing gaps between threads.
   
   So We can test, various scenario if iteration can be completed in a minute.
   1) 10 threads, each thread limit: 10k, flat director only within a parent 
and count: 200k
   2) 50 threads, each limit: 10k, flat directory only within a parent and 
count: 2M
   3) 10 thread, each limit 50k, flat directory only within a parent and count: 
2M
   
   This can check comparision of 2 and 3 case, with info log for time taken as 
already log present.
   cc: @aryangupta1998 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to