[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.


zhangyue19921010 commented on a change in pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#discussion_r777901212




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -134,12 +161,199 @@ public boolean archiveIfRequired(HoodieEngineContext 
context) throws IOException
         LOG.info("No Instants to archive");
       }
 
+      if (config.getArchiveAutoMergeEnable()) {
+        mergeArchiveFilesIfNecessary(context);
+      }
       return success;
     } finally {
       close();
     }
   }
 
+  private void mergeArchiveFilesIfNecessary(HoodieEngineContext context) 
throws IOException {
+    Path planPath = new Path(metaClient.getArchivePath(), 
mergeArchivePlanName);
+    // Flush reminded content if existed and open a new write
+    reOpenWriter();
+    // List all archive files
+    FileStatus[] fsStatuses = metaClient.getFs().globStatus(

Review comment:
       Nice catch here, I just find out that maybe it's important to keep the 
original instants order of small archive files.
   
https://github.com/apache/hudi/blob/b5f05fd153df29a8be377404a14a0ced2f00b4bf/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java#L219
   When load archived instants, hoodie will use this order to optimize skipping 
reading unnecessary archived files
   
https://github.com/apache/hudi/blob/b5f05fd153df29a8be377404a14a0ced2f00b4bf/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java#L243
   
   So just use the same order compactor here.
   What do you think? :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

Reply via email to