Re: [PR] perf: [Flink] Skip clustering for partitions with only one left small file in FlinkClusteringPlanStrategy [hudi]

via GitHub Sat, 18 Oct 2025 10:04:19 -0700


zoomake commented on code in PR #14087:
URL: https://github.com/apache/hudi/pull/14087#discussion_r2431017365



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/FlinkSizeBasedClusteringPlanStrategy.java:
##########
@@ -72,8 +73,18 @@ protected Map<String, String> getStrategyParams() {
 
   @Override
   protected Stream<FileSlice> getFileSlicesEligibleForClustering(final String 
partition) {
-    return super.getFileSlicesEligibleForClustering(partition)
-        // Only files that have base file size smaller than small file size 
are eligible.
-        .filter(slice -> 
slice.getBaseFile().map(HoodieBaseFile::getFileSize).orElse(0L) < 
getWriteConfig().getClusteringSmallFileLimit());
+    Supplier<Stream<FileSlice>> streamSupplier = () -> 
super.getFileSlicesEligibleForClustering(partition)

Review Comment:
   Thanks for the review! 
   
   I have updated the implementation to simplify the file count logic and added 
a check for clustering sort columns as you suggested. 
   
   Could you please take another look and review the changes? Thanks! 
@danny0405 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf: [Flink] Skip clustering for partitions with only one left small file in FlinkClusteringPlanStrategy [hudi]

Reply via email to