[ 
https://issues.apache.org/jira/browse/HUDI-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791711#comment-17791711
 ] 

sivabalan narayanan commented on HUDI-6979:
-------------------------------------------

this will definitely be a good addition

 

> support EventTimeBasedCompactionStrategy
> ----------------------------------------
>
>                 Key: HUDI-6979
>                 URL: https://issues.apache.org/jira/browse/HUDI-6979
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: compaction
>            Reporter: Kong Wei
>            Assignee: Kong Wei
>            Priority: Major
>
> The current compaction strategies are based on the logfile size, the number 
> of logfile files, etc. The data time of the RO table generated by these 
> strategies is uncontrollable. Hudi also has a DayBased strategy, but it 
> relies on day based partition path and the time granularity is coarse.
> The *EventTimeBasedCompactionStrategy* strategy can generate event 
> time-friendly RO tables, whether it is day based partition or not. For 
> example, the strategy can select all logfiles whose data time is before 3 am 
> for compaction, so that the generated RO table data is before 3 am. If we 
> just want to query data before 3 am, we can just query the RO table which is 
> much faster.
> With the strategy, I think we can expand the application scenarios of RO 
> tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to