hudi-bot opened a new issue, #16273:
URL: https://github.com/apache/hudi/issues/16273

   The current compaction strategies are based on the logfile size, the number 
of logfile files, etc. The data time of the RO table generated by these 
strategies is uncontrollable. Hudi also has a DayBased strategy, but it relies 
on day based partition path and the time granularity is coarse.
   
   
   The *EventTimeBasedCompactionStrategy* strategy can generate event 
time-friendly RO tables, whether it is day based partition or not. For example, 
the strategy can select all logfiles whose data time is before 3 am for 
compaction, so that the generated RO table data is before 3 am. If we just want 
to query data before 3 am, we can just query the RO table which is much faster.
   
   With the strategy, I think we can expand the application scenarios of RO 
tables.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6979
   - Type: New Feature
   
   
   ---
   
   
   ## Comments
   
   30/Nov/23 16:53;shivnarayan;this will definitely be a good addition
   
    ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to