zoomake commented on code in PR #14087:
URL: https://github.com/apache/hudi/pull/14087#discussion_r2438380891
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/FlinkSizeBasedClusteringPlanStrategy.java:
##########
@@ -72,8 +73,18 @@ protected Map<String, String> getStrategyParams() {
@Override
protected Stream<FileSlice> getFileSlicesEligibleForClustering(final String
partition) {
- return super.getFileSlicesEligibleForClustering(partition)
- // Only files that have base file size smaller than small file size
are eligible.
- .filter(slice ->
slice.getBaseFile().map(HoodieBaseFile::getFileSize).orElse(0L) <
getWriteConfig().getClusteringSmallFileLimit());
+ Supplier<Stream<FileSlice>> streamSupplier = () ->
super.getFileSlicesEligibleForClustering(partition)
Review Comment:
@danny0405 Thanks! I’ve updated the code to collect the stream into a list
to avoid multiple materializations.
The CI failure seems unrelated to this change. Please help review again when
you have time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]