hudi-bot opened a new issue, #16273: URL: https://github.com/apache/hudi/issues/16273
The current compaction strategies are based on the logfile size, the number of logfile files, etc. The data time of the RO table generated by these strategies is uncontrollable. Hudi also has a DayBased strategy, but it relies on day based partition path and the time granularity is coarse. The *EventTimeBasedCompactionStrategy* strategy can generate event time-friendly RO tables, whether it is day based partition or not. For example, the strategy can select all logfiles whose data time is before 3 am for compaction, so that the generated RO table data is before 3 am. If we just want to query data before 3 am, we can just query the RO table which is much faster. With the strategy, I think we can expand the application scenarios of RO tables. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-6979 - Type: New Feature --- ## Comments 30/Nov/23 16:53;shivnarayan;this will definitely be a good addition ;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
