Re: [PR] [HUDI-14086] [Flink] Skip clustering for partitions with only one left small file in FlinkClusteringPlanStrategy [hudi]

via GitHub Tue, 14 Oct 2025 20:12:45 -0700


danny0405 commented on code in PR #14087:
URL: https://github.com/apache/hudi/pull/14087#discussion_r2430994768



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/FlinkSizeBasedClusteringPlanStrategy.java:
##########
@@ -72,8 +73,18 @@ protected Map<String, String> getStrategyParams() {
 
   @Override
   protected Stream<FileSlice> getFileSlicesEligibleForClustering(final String 
partition) {
-    return super.getFileSlicesEligibleForClustering(partition)
-        // Only files that have base file size smaller than small file size 
are eligible.
-        .filter(slice -> 
slice.getBaseFile().map(HoodieBaseFile::getFileSize).orElse(0L) < 
getWriteConfig().getClusteringSmallFileLimit());
+    Supplier<Stream<FileSlice>> streamSupplier = () -> 
super.getFileSlicesEligibleForClustering(partition)

Review Comment:
   The file slice is generated based on different base files, so just count the 
stream should work: `stream().count()`.
   And we should also check the sort columns of the strategy, if some special 
sort columns are declared, we can not skip the clustering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-14086] [Flink] Skip clustering for partitions with only one left small file in FlinkClusteringPlanStrategy [hudi]

Reply via email to