[
https://issues.apache.org/jira/browse/HUDI-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-4313:
-----------------------------
Description:
we need to support a new compaction strategy called
LogFileModTimeBasedCompactionStrategy.
Using this strategy, we want to choose the file slice whose's earliest log file
mod time for compaction.
This will be similar to LogFileSizeBasedCompactionStrategy, just that instead
of comparing total log files size for a given file slice, we will use earliest
mod time for a given file slice.
The goal is to compact some part of the whole change set (say, 20%) in one
batch.
Compaction plan for a next batch should include incomplete operations from the
previous plans.
Operations should be processed in order of earliest log file modification time.
was:
we need to support a new compaction strategy called
LogFileModTimeBasedCompactionStrategy.
Using this strategy, we want to choose the file slice whose's earliest log file
mod time for compaction.
This will be similar to LogFileSizeBasedCompactionStrategy, just that instead
of comparing total log files size for a given file slice, we will use earliest
mod time for a given file slice.
> Support LogFileModTimeBasedCompactionStrategy
> ---------------------------------------------
>
> Key: HUDI-4313
> URL: https://issues.apache.org/jira/browse/HUDI-4313
> Project: Apache Hudi
> Issue Type: Improvement
> Components: compaction
> Reporter: sivabalan narayanan
> Priority: Major
>
> we need to support a new compaction strategy called
> LogFileModTimeBasedCompactionStrategy.
> Using this strategy, we want to choose the file slice whose's earliest log
> file mod time for compaction.
> This will be similar to LogFileSizeBasedCompactionStrategy, just that instead
> of comparing total log files size for a given file slice, we will use
> earliest mod time for a given file slice.
> The goal is to compact some part of the whole change set (say, 20%) in one
> batch.
> Compaction plan for a next batch should include incomplete operations from
> the previous plans.
> Operations should be processed in order of earliest log file modification
> time.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)