[ 
https://issues.apache.org/jira/browse/HUDI-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4313:
-----------------------------
    Description: 
we need to support a new compaction strategy called 
LogFileModTimeBasedCompactionStrategy.

Using this strategy, we want to choose the file slice whose's earliest log file 
mod time for compaction. 

This will be similar to LogFileSizeBasedCompactionStrategy, just that instead 
of comparing total log files size for a given file slice, we will use earliest 
mod time for a given file slice. 


The goal is to compact some part of the whole change set (say, 20%) in one 
batch.
Compaction plan for a next batch should include incomplete operations from the 
previous plans.
Operations should be processed in order of earliest log file modification time.

  was:
we need to support a new compaction strategy called 
LogFileModTimeBasedCompactionStrategy.

Using this strategy, we want to choose the file slice whose's earliest log file 
mod time for compaction. 

This will be similar to LogFileSizeBasedCompactionStrategy, just that instead 
of comparing total log files size for a given file slice, we will use earliest 
mod time for a given file slice. 


> Support LogFileModTimeBasedCompactionStrategy
> ---------------------------------------------
>
>                 Key: HUDI-4313
>                 URL: https://issues.apache.org/jira/browse/HUDI-4313
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: compaction
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> we need to support a new compaction strategy called 
> LogFileModTimeBasedCompactionStrategy.
> Using this strategy, we want to choose the file slice whose's earliest log 
> file mod time for compaction. 
> This will be similar to LogFileSizeBasedCompactionStrategy, just that instead 
> of comparing total log files size for a given file slice, we will use 
> earliest mod time for a given file slice. 
> The goal is to compact some part of the whole change set (say, 20%) in one 
> batch.
> Compaction plan for a next batch should include incomplete operations from 
> the previous plans.
> Operations should be processed in order of earliest log file modification 
> time.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to