bvaradar commented on issue #2240:
URL: https://github.com/apache/hudi/issues/2240#issuecomment-730037189


   Just looking at the timestamps for last compaction, clean and delta commit 
operations,
   
   1. Compaction:
   2020-11-17 16:36:41 5097604 20201117163409.commit
   2020-11-17 16:34:21 0 20201117163409.compaction.inflight
   
   THis means that compaction took around 2.3 mins to finish
   
   Ingestion:
   2020-11-17 16:34:02 5274215 20201117162434.deltacommit
   2020-11-17 16:30:26 3916496 20201117162434.deltacommit.inflight
   2020-11-17 16:26:00 0 20201117162434.deltacommit.requested
   The data write part alone took like ~8 mins
   
   
   Cleaning:
   
   2020-11-17 16:49:44 1798629 20201117162434.clean
   2020-11-17 16:26:53 1665412 20201117162434.clean.inflight
   2020-11-17 16:26:52 1665412 20201117162434.clean.requested
   
   This seems to have taken like 25 mins. With a single partition, the listing 
performance should be similar for ingestion and cleaning. If cleaning was run 
asynchronously, you might need to give more executors to run parallel clean. I 
am also wondering if the deletes are getting throttled in your case which is 
slowing down clean. Can you take a look at executor logs to see if each file 
deletion is taking long time.  You can also try disabling cleaning 
"hoodie.clean.automatic=false" and enable it every N writes if the slowdown is 
not due to deletes throttling. 
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to