[
https://issues.apache.org/jira/browse/HUDI-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462407#comment-17462407
]
scx commented on HUDI-3069:
---------------------------
the pr : https://github.com/apache/hudi/pull/4400
> compact improve
> ---------------
>
> Key: HUDI-3069
> URL: https://issues.apache.org/jira/browse/HUDI-3069
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Common Core
> Reporter: scx
> Priority: Major
> Labels: performance, pull-request-available
> Fix For: 0.11.0
>
>
> I found that when the compact plan is generated, the delta log files under
> each filegroup are arranged in the natural order of instant time. in the
> majority of cases,We can think that the latest data is in the latest delta
> log file, so we sort it from large to small according to the instance time,
> which can largely avoid rewriting the data in the compact process, and then
> optimize the compact time.
> In addition, when reading the delta log file, we compare the data in the
> external spillablemap with the delta log data. If oldrecord is selected,
> there is no need to rewrite the data in the external spillablemap. Rewriting
> data will waste a lot of resources when data is spill to disk
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)