[GitHub] [hudi] kywe665 commented on a change in pull request #4010: [HUDI-2770] - Docs for (HUDI-2737) - Use earliest instant for async compaction and clustering

GitBox Mon, 22 Nov 2021 17:15:19 -0800


kywe665 commented on a change in pull request #4010:
URL: https://github.com/apache/hudi/pull/4010#discussion_r754745374




##########
File path: website/docs/compaction.md
##########
@@ -1,33 +1,26 @@
 ---
 title: Compaction
-summary: "In this page, we describe async compaction in Hudi."
 toc: true
 last_modified_at:
 ---
 
-For Merge-On-Read table, data is stored using a combination of columnar (e.g 
parquet) + row based (e.g avro) file formats.
-Updates are logged to delta files & later compacted to produce new versions of 
columnar files synchronously or
-asynchronously. One of the main motivations behind Merge-On-Read is to reduce 
data latency when ingesting records.
-Hence, it makes sense to run compaction asynchronously without blocking 
ingestion.
-
+Compaction is executed asynchronously with Hudi by default.
 
 ## Async Compaction
-
 Async Compaction is performed in 2 steps:
 
 1. ***Compaction Scheduling***: This is done by the ingestion job. In this 
step, Hudi scans the partitions and selects **file
    slices** to be compacted. A compaction plan is finally written to Hudi 
timeline.
 1. ***Compaction Execution***: A separate process reads the compaction plan 
and performs compaction of file slices.
 
+## Scheduling Async Compaction
 
-## Deployment Models
-
-There are few ways by which we can execute compactions asynchronously.
+There are few ways by which we can schedule compactions to the Hudi timeline 
to be executed later asynchronously.
 
-### Spark Structured Streaming
+### Schedule compaction with Spark Structured Streaming

Review comment:
       nice catch, I reverted these changes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] kywe665 commented on a change in pull request #4010: [HUDI-2770] - Docs for (HUDI-2737) - Use earliest instant for async compaction and clustering

Reply via email to