danny0405 commented on PR #12448: URL: https://github.com/apache/hudi/pull/12448#issuecomment-2527182857
Nice feature, the incremental compaction and clustering have always been a strong request from community users and they are valuable. I have some thoughts here: 1. we better generalize the incremental plan scheduling for both compaction and clustering; 2. try to avoid to relies on Flink state for the back-up of state-store, we might still utilities the timeline to fetch the incremental partitions added to the table like what we do to incremental cleaning; 3. if there is pending compaction in partition `p1`, and the `p1` has been written into new logs since last compaction, how can we continue to schedule the plan for `p1` for next compaction scheduling run? 4. we need to consider whether to design a new interface for incremental compaction planning, previously we pass around all the table partiitons to `CompactionStrategy.filterPartitionPaths`. It's great if we can fire a RFC to address these issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
