sivabalan narayanan created HUDI-5012:
-----------------------------------------

             Summary: Fix clean planning for very large partitions
                 Key: HUDI-5012
                 URL: https://issues.apache.org/jira/browse/HUDI-5012
             Project: Apache Hudi
          Issue Type: Improvement
          Components: cleaning
            Reporter: sivabalan narayanan


Within clean planning phase, we do a map() for every partition and then trigger 
planning for each partition within that. 

 

For very large number of partitions, and if cleaner shuffle parallelism is 
small, this results in more sequential planning. We can enhance this with 
mapPartitions call and optimize it 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to