[
https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-5318:
---------------------------------
Labels: pull-request-available (was: )
> Clustering schduling now will list all partition in table when
> PARTITION_SELECTED is set
> ----------------------------------------------------------------------------------------
>
> Key: HUDI-5318
> URL: https://issues.apache.org/jira/browse/HUDI-5318
> Project: Apache Hudi
> Issue Type: Bug
> Components: clustering
> Reporter: Qijun Fu
> Assignee: Qijun Fu
> Priority: Major
> Labels: pull-request-available
>
> Currently PartitionAwareClusteringPlanStrategy will list all partition in
> table whether PARTITION_SELECTED is set or not. List all partition in the
> dataset is a very expensive operation when the number of partition is huge.
> We can skip list all partition when PARTITION_SELECTED is set, so that
> clustering scheduling can benefit a lot fromĀ partition pruning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)