(hudi) branch asf-site updated: [DOCS] Update hoodie_cleaner.md , Spelling correction and one example cli command updated (#12459)

danny0405 Tue, 10 Dec 2024 19:01:49 -0800

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new ecb3d125de0 [DOCS] Update hoodie_cleaner.md , Spelling correction and 
one example cli command updated (#12459)
ecb3d125de0 is described below

commit ecb3d125de071f8c12c7a214cbfc962114dbab8c
Author: Krishna Prasad <[email protected]>
AuthorDate: Wed Dec 11 12:00:39 2024 +0900

    [DOCS] Update hoodie_cleaner.md , Spelling correction and one example cli 
command updated (#12459)
    
    h2.Spelling correction
    * atleast => at least
    * long => the long
    
    h2.one example cli command updated
    `Example of cleaner keeping the latest 10 commits` but in the cli command 
its 10 so the comments should be fixed as below:
    
    `Example of cleaner keeping the latest 3 commits` since the default value 
of the config is already 10.
---
 website/versioned_docs/version-0.15.0/hoodie_cleaner.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/website/versioned_docs/version-0.15.0/hoodie_cleaner.md 
b/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
index c050604c6e9..5042d6802d4 100644
--- a/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
+++ b/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
@@ -8,8 +8,8 @@ toc_max_heading_level: 4
 Cleaning is a table service employed by Hudi to reclaim space occupied by 
older versions of data and keep storage costs 
 in check. Apache Hudi provides snapshot isolation between writers and readers 
by managing multiple versioned files with **MVCC** 
 concurrency. These file versions provide history and enable time travel and 
rollbacks, but it is important to manage 
-how much history you keep to balance your costs. Cleaning service plays a 
crucial role in manging the tradeoff between 
-retaining long history of data and the associated storage costs.  
+how much history you keep to balance your costs. Cleaning service plays a 
crucial role in managing the tradeoff between 
+retaining the long history of data and the associated storage costs.  
 
 Hudi enables [Automatic Hudi 
cleaning](/docs/configurations/#hoodiecleanautomatic) by default. Cleaning is 
invoked 
 immediately after each commit, to delete older file slices. It's recommended 
to leave this enabled to ensure metadata 
@@ -32,7 +32,7 @@ Hudi cleaner currently supports the below cleaning policies 
to keep a certain nu
 - **KEEP_LATEST_COMMITS**: This is the default policy. This is a temporal 
cleaning policy that ensures the effect of
   having lookback into all the changes that happened in the last X commits. 
Suppose a writer is ingesting data
   into a Hudi dataset every 30 minutes and the longest running query can take 
5 hours to finish, then the user should
-  retain atleast the last 10 commits. With such a configuration, we ensure 
that the oldest version of a file is kept on
+  retain at least the last 10 commits. With such a configuration, we ensure 
that the oldest version of a file is kept on
   disk for at least 5 hours, thereby preventing the longest running query from 
failing at any point in time. Incremental
   cleaning is also possible using this policy.
   Number of commits to retain can be configured by 
[`hoodie.cleaner.commits.retained`](https://analytics.google.com/analytics/web/#/p300324801/reports/intelligenthome).
 
@@ -133,7 +133,7 @@ CLI provides the below commands for cleaner service:
 - `clean showpartitions`
 - `cleans run`
 
-Example of cleaner keeping the latest 10 commits
+Example of cleaner keeping the latest 3 commits
 ```
 cleans run --sparkMaster local --hoodieConfigs 
hoodie.cleaner.policy=KEEP_LATEST_COMMITS hoodie.cleaner.commits.retained=3 
hoodie.cleaner.parallelism=200
 ```
@@ -145,4 +145,4 @@ You can find more details and the relevant code for these 
commands in [`org.apac
 
 * [Cleaner Service: Save up to 40% on data lake storage costs | Hudi 
Labs](https://youtu.be/mUvRhJDoO3w)
 * [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of 
Scheduling Data Cleaning #1](https://www.youtube.com/watch?v=CEzgFtmVjx4)
-* [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of 
Scheduling Data Cleaning #2](https://www.youtube.com/watch?v=RbBF9Ys2GqM)
\ No newline at end of file
+* [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of 
Scheduling Data Cleaning #2](https://www.youtube.com/watch?v=RbBF9Ys2GqM)

(hudi) branch asf-site updated: [DOCS] Update hoodie_cleaner.md , Spelling correction and one example cli command updated (#12459)

Reply via email to