[GitHub] [hudi] danny0405 commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

GitBox Sun, 13 Mar 2022 23:30:07 -0700


danny0405 commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r825624914




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> 
!fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest 
delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid 
rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       What do you mean by `avoid rewriting the data in the compact process` 
here ? Shouldn't the reader have the same merged content no matter what the 
read sequence is for log files ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Reply via email to