Qijun Fu created HUDI-5318:
------------------------------

             Summary: Clustering schduling now will list all partition in table 
when PARTITION_SELECTED is set
                 Key: HUDI-5318
                 URL: https://issues.apache.org/jira/browse/HUDI-5318
             Project: Apache Hudi
          Issue Type: Bug
          Components: clustering
            Reporter: Qijun Fu
            Assignee: Qijun Fu


Currently PartitionAwareClusteringPlanStrategy will list all partition in table 
whether PARTITION_SELECTED is set or not. List all partition in the dataset is 
a very expensive operation when the number of partition is huge. We can skip 
list all partition when PARTITION_SELECTED is set, so that clustering 
scheduling can benefit a lot fromĀ  partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to