kbuci opened a new pull request, #18172: URL: https://github.com/apache/hudi/pull/18172
### Describe the issue this Pull Request addresses When the single clustering group config is **disabled** (`hoodie.clustering.plan.strategy.single.group.clustering.enabled=false`), the clustering plan strategy could still create clustering groups where both the number of input files and output files was 1. Clustering one file into one file has no benefit and wastes resources. This fix ensures that when single-group clustering is disabled, such no-op groups are not created. ### Summary and Changelog **Summary:** When single-group clustering is disabled, clustering no longer schedules groups that would cluster one file into one file. All other clustering behavior is unchanged. **Changelog:** - **PartitionAwareClusteringPlanStrategy**: If `isSingleGroupClusteringEnabled` is enabled, then clustering groups should be skipped if # of input/output file slices are the same ### Impact - **Public API:** None. - **Performance:** Reduces unnecessary clustering work and scheduling for single-file partitions when the config is disabled. ### Risk Level **Low.** The change only affects the case where single-group clustering is disabled and a group would have 1 input and 1 output; all other behavior is unchanged. Logic is covered by the new unit test `testRemaningFileInPartitionNotClustered()`. ### Documentation Update None. This is a behavioral fix for an existing config; no new config or default change. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
