parisni opened a new issue, #5835:
URL: https://github.com/apache/hudi/issues/5835

   I experience a major problem with cleaning, which time increase linearly 
with partitions number.
   
   See below logs, of batch loop of inserts in a hudi table with growing number 
of partitions:
   ```
   Incremental Cleaning mode is enabled
   Total Partitions to clean : 6960, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7080, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7200, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7320, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7440, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7560, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7680, with policy KEEP_LATEST_COMMITS
   Total Partitions to clean : 7800, with policy KEEP_LATEST_COMMITS
   ```
   
   I debugged this and the cleaner allways fall back to brute force partition 
cleaning, on file system because the lastClean commit is always empty:
   
   
https://github.com/apache/hudi/blob/a048e940fd6e3f62e443bca5831e99144900a33f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L155
   
   
https://github.com/apache/hudi/blob/a048e940fd6e3f62e443bca5831e99144900a33f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L207-L214
   
   This makes the cleaner unable to work on table with large number of 
partitions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to