hudi-agent commented on code in PR #18945:
URL: https://github.com/apache/hudi/pull/18945#discussion_r3382301471
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/ClusteringPlanStrategy.java:
##########
@@ -137,6 +140,20 @@ protected Stream<FileSlice>
getFileSlicesEligibleForClustering(String partition)
*/
protected abstract Map<String, String> getStrategyParams();
+ /**
+ * Keep partitions from the current scheduling window that are not scheduled
in this plan as missing
+ * partitions so that they can be picked up by later incremental clustering
schedules.
+ */
+ protected List<String> getMissingPartitionsFromCurrentWindow(List<String>
partitionsToSchedule,
+ List<String>
partitionsInCurrentWindow) {
+ if (!getWriteConfig().isIncrementalTableServiceEnabled()) {
+ return Collections.emptyList();
+ }
+ Set<String> missingPartitions = new
LinkedHashSet<>(partitionsInCurrentWindow);
Review Comment:
🤖 Returning `Collections.emptyList()` here gives the caller an immutable
list. In `PartitionAwareClusteringPlanStrategy.generateClusteringPlan` the
regex branch now does `missingPartitions =
getMissingPartitionsFromCurrentWindow(...)` and a few lines later runs
`missingPartitions.addAll(partitionsPair.getRight())`. When incremental table
service is disabled AND filter mode is `RECENT_DAYS`,
`partitionsPair.getRight()` can be non-empty (see
`ClusteringPlanPartitionFilter.recentDaysFilter`), so the `addAll` will throw
`UnsupportedOperationException`. Previously this code path kept
`missingPartitions` as the mutable `new ArrayList<>()` initializer, so it
worked. Could you return `new ArrayList<>()` here (or have the caller avoid
reassigning when disabled)?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java:
##########
@@ -169,13 +169,15 @@ public Option<HoodieClusteringPlan>
generateClusteringPlan(ClusteringPlanActionE
if (StringUtils.isNullOrEmpty(partitionSelected)) {
// get matched partitions if set
- partitionPaths = getRegexPatternMatchedPartitions(config,
partitions.get());
+ // partitionsInCurrentWindow = incremental partitions + missing
partitions in last plan
+ List<String> partitionsInCurrentWindow = partitions.get();
+ partitionPaths = getRegexPatternMatchedPartitions(config,
partitionsInCurrentWindow);
+ missingPartitions =
getMissingPartitionsFromCurrentWindow(partitionPaths,
partitionsInCurrentWindow);
// filter the partition paths if needed to reduce list status
} else {
partitionPaths = Arrays.asList(partitionSelected.split(","));
- // Users may temporarily set specific partitions for clustering.
- // Ensure the coherence of the missing partitions.
- missingPartitions =
(List<String>)executor.fetchMissingPartitions(TableServiceType.CLUSTER).getRight();
+ missingPartitions = getMissingPartitionsFromCurrentWindow(partitionPaths,
Review Comment:
🤖 nit: the ternary here looks asymmetric with the `if`-branch above (which
calls `getMissingPartitionsFromCurrentWindow` directly), and it's not obvious
why. Since `partitions` is `Lazy`, the guard exists to avoid triggering an
expensive load when not needed — could you add a short inline comment like `//
avoid triggering lazy partition load when incremental is disabled` to make the
intent clear?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]