scx created HUDI-3069:
-------------------------

             Summary: compact improve
                 Key: HUDI-3069
                 URL: https://issues.apache.org/jira/browse/HUDI-3069
             Project: Apache Hudi
          Issue Type: Improvement
          Components: Common Core
            Reporter: scx
             Fix For: 0.11.0


I found that when the compact plan is generated, the delta log files under each 
filegroup are arranged in the natural order of instant time. in the majority of 
cases,We can think that the latest data is in the latest delta log file, so we 
sort it from large to small according to the instance time, which can largely 
avoid rewriting the data in the compact process, and then optimize the compact 
time.

In addition, when reading the delta log file, we compare the data in the 
external spillablemap with the delta log data. If oldrecord is selected, there 
is no need to rewrite the data in the external spillablemap. Rewriting data 
will waste a lot of resources when data is spill to disk

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to