Re: [PR] [HUDI-8675] Flink schedule compaction with incremental partitions [hudi]

via GitHub Sun, 08 Dec 2024 23:53:15 -0800


danny0405 commented on PR #12448:
URL: https://github.com/apache/hudi/pull/12448#issuecomment-2527182857


   Nice feature, the incremental compaction and clustering have always been a 
strong request from community users and they are valuable. I have some thoughts 
here:
   
   1. we better generalize the incremental plan scheduling for both compaction 
and clustering;
   2. try to avoid to relies on Flink state for the back-up of state-store, we 
might still utilities the timeline to fetch the incremental partitions added to 
the table like what we do to incremental cleaning;
   3. if there is pending compaction in partition `p1`, and the `p1` has been 
written into new logs since last compaction, how can we continue to schedule 
the plan for `p1` for next compaction scheduling run?
   4. we need to consider whether to design a new interface for incremental 
compaction planning, previously we pass around all the table partiitons to 
`CompactionStrategy.filterPartitionPaths`.
   
   It's great if we can fire a RFC to address these issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8675] Flink schedule compaction with incremental partitions [hudi]

Reply via email to