Qijun Fu created HUDI-5318:
------------------------------
Summary: Clustering schduling now will list all partition in table
when PARTITION_SELECTED is set
Key: HUDI-5318
URL: https://issues.apache.org/jira/browse/HUDI-5318
Project: Apache Hudi
Issue Type: Bug
Components: clustering
Reporter: Qijun Fu
Assignee: Qijun Fu
Currently PartitionAwareClusteringPlanStrategy will list all partition in table
whether PARTITION_SELECTED is set or not. List all partition in the dataset is
a very expensive operation when the number of partition is huge. We can skip
list all partition when PARTITION_SELECTED is set, so that clustering
scheduling can benefit a lot fromĀ partition pruning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)