WTa-hash edited a comment on issue #2229: URL: https://github.com/apache/hudi/issues/2229#issuecomment-722808651
> Just want to make sure if you understood compaction vs cleaning in Hudi. Why do you want to wait for 30 days before running compaction ? Do you mean cleaning the old versions 30 days back ? Hello. I was just giving an example of one of my concerns with the number of commits to trigger compaction approach. Here is my scenario: 1) Spark structured stream queries Kinesis. 2) Spark processed a batch 3) Batch contains data from table X, Y, Z. My foreachBatch logic will group these records by table and Hudi will run 3 times using foreach table loop where Hudi will process each table sequentially. Hudi has INLINE_COMPACT_NUM_DELTA_COMMITS_PROP set to 10. 4) Next 9 batches have data for table X and Y, but none of these batches contain data for table Z Does this mean compaction will run for Table X and Y to compact data from batch 1 to 10, and Table Z will not be compact? If there's a way to compact based on time, then we can set the time to every 10 minutes and that would mean cold tables would be compacted too. It seems INLINE_COMPACT_NUM_DELTA_COMMITS_PROP will trigger compaction often on hot tables, but may not trigger for cold tables. My concern with cold tables is that it's possible it can take days or weeks before enough changes/commits come in to trigger compaction. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
