This is an automated email from the ASF dual-hosted git repository.
danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new ecb3d125de0 [DOCS] Update hoodie_cleaner.md , Spelling correction and
one example cli command updated (#12459)
ecb3d125de0 is described below
commit ecb3d125de071f8c12c7a214cbfc962114dbab8c
Author: Krishna Prasad <[email protected]>
AuthorDate: Wed Dec 11 12:00:39 2024 +0900
[DOCS] Update hoodie_cleaner.md , Spelling correction and one example cli
command updated (#12459)
h2.Spelling correction
* atleast => at least
* long => the long
h2.one example cli command updated
`Example of cleaner keeping the latest 10 commits` but in the cli command
its 10 so the comments should be fixed as below:
`Example of cleaner keeping the latest 3 commits` since the default value
of the config is already 10.
---
website/versioned_docs/version-0.15.0/hoodie_cleaner.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
b/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
index c050604c6e9..5042d6802d4 100644
--- a/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
+++ b/website/versioned_docs/version-0.15.0/hoodie_cleaner.md
@@ -8,8 +8,8 @@ toc_max_heading_level: 4
Cleaning is a table service employed by Hudi to reclaim space occupied by
older versions of data and keep storage costs
in check. Apache Hudi provides snapshot isolation between writers and readers
by managing multiple versioned files with **MVCC**
concurrency. These file versions provide history and enable time travel and
rollbacks, but it is important to manage
-how much history you keep to balance your costs. Cleaning service plays a
crucial role in manging the tradeoff between
-retaining long history of data and the associated storage costs.
+how much history you keep to balance your costs. Cleaning service plays a
crucial role in managing the tradeoff between
+retaining the long history of data and the associated storage costs.
Hudi enables [Automatic Hudi
cleaning](/docs/configurations/#hoodiecleanautomatic) by default. Cleaning is
invoked
immediately after each commit, to delete older file slices. It's recommended
to leave this enabled to ensure metadata
@@ -32,7 +32,7 @@ Hudi cleaner currently supports the below cleaning policies
to keep a certain nu
- **KEEP_LATEST_COMMITS**: This is the default policy. This is a temporal
cleaning policy that ensures the effect of
having lookback into all the changes that happened in the last X commits.
Suppose a writer is ingesting data
into a Hudi dataset every 30 minutes and the longest running query can take
5 hours to finish, then the user should
- retain atleast the last 10 commits. With such a configuration, we ensure
that the oldest version of a file is kept on
+ retain at least the last 10 commits. With such a configuration, we ensure
that the oldest version of a file is kept on
disk for at least 5 hours, thereby preventing the longest running query from
failing at any point in time. Incremental
cleaning is also possible using this policy.
Number of commits to retain can be configured by
[`hoodie.cleaner.commits.retained`](https://analytics.google.com/analytics/web/#/p300324801/reports/intelligenthome).
@@ -133,7 +133,7 @@ CLI provides the below commands for cleaner service:
- `clean showpartitions`
- `cleans run`
-Example of cleaner keeping the latest 10 commits
+Example of cleaner keeping the latest 3 commits
```
cleans run --sparkMaster local --hoodieConfigs
hoodie.cleaner.policy=KEEP_LATEST_COMMITS hoodie.cleaner.commits.retained=3
hoodie.cleaner.parallelism=200
```
@@ -145,4 +145,4 @@ You can find more details and the relevant code for these
commands in [`org.apac
* [Cleaner Service: Save up to 40% on data lake storage costs | Hudi
Labs](https://youtu.be/mUvRhJDoO3w)
* [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of
Scheduling Data Cleaning #1](https://www.youtube.com/watch?v=CEzgFtmVjx4)
-* [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of
Scheduling Data Cleaning #2](https://www.youtube.com/watch?v=RbBF9Ys2GqM)
\ No newline at end of file
+* [Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of
Scheduling Data Cleaning #2](https://www.youtube.com/watch?v=RbBF9Ys2GqM)