[GitHub] [hudi] scxwhite commented on a change in pull request #4400: [HUDI-3069] compact improve

GitBox Fri, 14 Jan 2022 18:45:50 -0800


scxwhite commented on a change in pull request #4400:
URL: https://github.com/apache/hudi/pull/4400#discussion_r785259224




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -264,8 +264,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> 
!fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // We can think that the latest data is in the latest delta log 
file, so we sort it from large

Review comment:
       > I think you are assuming the later writes in the log always overwrites 
the earlier ones? this is not true always.
   In the compact plan generation phase, I just changed the order of reading 
delta log files. In the internal production environment, I have used this 
method for a month, and no data exceptions have occurred（cluster、clean、compact 
all inline）. Now, I don't know how I should test this place. Can you give me 
some suggestions
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] scxwhite commented on a change in pull request #4400: [HUDI-3069] compact improve

Reply via email to