bvaradar commented on issue #2151: URL: https://github.com/apache/hudi/issues/2151#issuecomment-706029681
@tandonraghav : You need to make synchronize the compaction scheduling (not compaction execution) with spark DF write. You can then run compaction for scheduled compaction separately. Just so that you are made aware of all things : Note that inline compaction does not need to run every single time you are ingesting data. You can set it to run every N commits but it will be inline when it runs (blocks writing) Let me open a jira to support such a setup. We usually have folks running async compaction in delta-streamer continuous mode and in structured streaming (recently). Async compaction in spark DF write or in deltastreamer run-once mode is generally not done as users need to setup separate compaction job. Meanwhile, you can comment out the line https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java#L724 to make inline compaction not run compactions but only schedule them. You can then use your writeClient code to run compactions. Let me know if this makes sense.. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
