parisni commented on issue #6373:
URL: https://github.com/apache/hudi/issues/6373#issuecomment-1213305473

   for 2. , I guess the reason the cleaning with metadata table is slow is due 
to filelisting and not partition listing. Filelisting is done on the metadtata 
table side while partition listing is done on filesystem 
https://github.com/apache/hudi/blob/6e7ac457352e007939ba3c44c9dc197de7b88ed3/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L310
   
   A way to improve this would be filegroup listing fall back on file system 
(behavior without metadata table).
   
   for 1. from debugging + source code, incremental cleaning occurs only when a 
cleaning file deletion has happened. Then it only consider following commits. I 
guess this is both a performance problem (in my case + bulk-insert case) and 
can leave old partition uncleaned. This is complicated to explain, and... I 
might be wrong 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to