scx created HUDI-3069:
-------------------------
Summary: compact improve
Key: HUDI-3069
URL: https://issues.apache.org/jira/browse/HUDI-3069
Project: Apache Hudi
Issue Type: Improvement
Components: Common Core
Reporter: scx
Fix For: 0.11.0
I found that when the compact plan is generated, the delta log files under each
filegroup are arranged in the natural order of instant time. in the majority of
cases,We can think that the latest data is in the latest delta log file, so we
sort it from large to small according to the instance time, which can largely
avoid rewriting the data in the compact process, and then optimize the compact
time.
In addition, when reading the delta log file, we compare the data in the
external spillablemap with the delta log data. If oldrecord is selected, there
is no need to rewrite the data in the external spillablemap. Rewriting data
will waste a lot of resources when data is spill to disk
--
This message was sent by Atlassian Jira
(v8.20.1#820001)