devmadhuu commented on code in PR #4206:
URL: https://github.com/apache/ozone/pull/4206#discussion_r1100986203


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java:
##########
@@ -76,27 +76,38 @@ public FileSizeCountTask(FileCountBySizeDao 
fileCountBySizeDao,
    */
   @Override
   public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
-    Table<String, OmKeyInfo> omKeyInfoTable =
-        omMetadataManager.getKeyTable(getBucketLayout());
+    // Map to store the count of files based on file size
     Map<FileSizeCountKey, Long> fileSizeCountMap = new HashMap<>();
+
+    // Call reprocessBucket method for FILE_SYSTEM_OPTIMIZED bucket layout
+    reprocessBucket(BucketLayout.FILE_SYSTEM_OPTIMIZED, omMetadataManager,
+        fileSizeCountMap);
+    // Call reprocessBucket method for LEGACY bucket layout
+    reprocessBucket(BucketLayout.LEGACY, omMetadataManager, fileSizeCountMap);
+
+    // Delete all records from FILE_COUNT_BY_SIZE table
+    int execute = dslContext.delete(FILE_COUNT_BY_SIZE).execute();
+    LOG.info("Deleted {} records from {}", execute, FILE_COUNT_BY_SIZE);
+    writeCountsToDB(true, fileSizeCountMap);
+    LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+    return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void reprocessBucket(BucketLayout bucketLayout,
+                               OMMetadataManager omMetadataManager,
+                               Map<FileSizeCountKey, Long> fileSizeCountMap) {
+    Table<String, OmKeyInfo> omKeyInfoTable =
+        omMetadataManager.getKeyTable(bucketLayout);
     try (TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
-        keyIter = omKeyInfoTable.iterator()) {
+             keyIter = omKeyInfoTable.iterator()) {
       while (keyIter.hasNext()) {
         Table.KeyValue<String, OmKeyInfo> kv = keyIter.next();
         handlePutKeyEvent(kv.getValue(), fileSizeCountMap);

Review Comment:
   @ArafatKhan2198  - thanks for working on this patch. I think this 
"fileSizeCountMap" may occupy large memory for holding key/count data in case 
of large number of volumes and if each volume has large number of buckets with 
varying file size ranges, then key will be different and map size will 
increase. Please see if you can flush the map and write the data to DB once map 
hits reaching storing of 1M keys.. Condition is rare, but still better take 
precaution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to