[GitHub] [hudi] bvaradar commented on issue #2151: [SUPPORT] How to run Periodic Compaction? Multiple Tables - When no Upserts

GitBox Fri, 09 Oct 2020 00:52:25 -0700


bvaradar commented on issue #2151:
URL: https://github.com/apache/hudi/issues/2151#issuecomment-706029681



   @tandonraghav : You need to make synchronize the compaction scheduling (not 
compaction execution) with spark DF write. You can then run compaction for 
scheduled compaction separately.  
   
   Just so that you are made aware of all things : Note that inline compaction 
does not need to run every single time you are ingesting data. You can set it 
to run every N commits but it will be inline when it runs (blocks writing)
   
   Let me open a jira to support such a setup. We usually have folks running 
async compaction in delta-streamer continuous mode and in structured streaming 
(recently).  Async compaction in spark DF write or in deltastreamer run-once 
mode is generally not done as users need to setup separate compaction job.
   
   Meanwhile, you can comment out the line  
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java#L724
 to make inline compaction not run compactions but only schedule them. You can 
then use your writeClient code to run compactions. Let me know if this makes 
sense..
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on issue #2151: [SUPPORT] How to run Periodic Compaction? Multiple Tables - When no Upserts

Reply via email to