bhasudha commented on code in PR #5601:
URL: https://github.com/apache/hudi/pull/5601#discussion_r874435884


##########
website/docs/compaction.md:
##########
@@ -10,7 +10,7 @@ Compaction is executed asynchronously with Hudi by default. 
Async Compaction is
 
 1. ***Compaction Scheduling***: This is done by the ingestion job. In this 
step, Hudi scans the partitions and selects **file
    slices** to be compacted. A compaction plan is finally written to Hudi 
timeline.
-1. ***Compaction Execution***: A separate process reads the compaction plan 
and performs compaction of file slices.
+1. ***Compaction Execution***: In this step the compaction plan is read and 
file slices are compacted.

Review Comment:
   These seem like already merged changes not relavant to this PR. Can you 
ensure you have rebased from apache/hudi ?



##########
website/learn/faq.md:
##########
@@ -253,6 +253,28 @@ Simplest way to run compaction on MOR dataset is to run 
the [compaction inline](
 
 That said, for obvious reasons of not blocking ingesting for compaction, you 
may want to run it asynchronously as well. This can be done either via a 
separate [compaction 
job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java)
 that is scheduled by your workflow scheduler/notebook independently. If you 
are using delta streamer, then you can run in [continuous 
mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L241)
 where the ingestion and compaction are both managed concurrently in a single 
spark run time.
 
+### What options do I have for asynchronous/offline compactions on MOR dataset?

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to