This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 498a5b5 Travis CI build asf-site
498a5b5 is described below
commit 498a5b58a866057541bfb2a1eccef5e136ef808d
Author: CI <[email protected]>
AuthorDate: Sun Jun 13 23:05:11 2021 +0000
Travis CI build asf-site
---
content/activity.html | 24 ++
.../blog/hoodie-cleaner/Initial_timeline.png | Bin 0 -> 141789 bytes
.../blog/hoodie-cleaner/Retain_latest_commits.png | Bin 0 -> 145891 bytes
.../blog/hoodie-cleaner/Retain_latest_versions.png | Bin 0 -> 146622 bytes
content/assets/js/lunr/lunr-store.js | 5 +
content/blog.html | 24 ++
.../index.html | 358 +++++++++++++++++++++
content/cn/activity.html | 24 ++
content/sitemap.xml | 4 +
9 files changed, 439 insertions(+)
diff --git a/content/activity.html b/content/activity.html
index 3cacb6f..9171a39 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -195,6 +195,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/employing-right-configurations-for-hudi-cleaner/"
rel="permalink">Employing correct configurations for Hudi’s cleaner table
service
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a
href="https://cwiki.apache.org/confluence/display/~pratyakshsharma">Pratyaksh
Sharma</a> posted on <time datetime="2021-06-10">June 10, 2021</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">Ensuring isolation
between Hudi writers and readers using HoodieCleaner.java
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-file-sizing/" rel="permalink">Streaming
Responsibly - How Apache Hudi maintains optimum sized files
</a>
diff --git a/content/assets/images/blog/hoodie-cleaner/Initial_timeline.png
b/content/assets/images/blog/hoodie-cleaner/Initial_timeline.png
new file mode 100644
index 0000000..79780ba
Binary files /dev/null and
b/content/assets/images/blog/hoodie-cleaner/Initial_timeline.png differ
diff --git
a/content/assets/images/blog/hoodie-cleaner/Retain_latest_commits.png
b/content/assets/images/blog/hoodie-cleaner/Retain_latest_commits.png
new file mode 100644
index 0000000..e80e438
Binary files /dev/null and
b/content/assets/images/blog/hoodie-cleaner/Retain_latest_commits.png differ
diff --git
a/content/assets/images/blog/hoodie-cleaner/Retain_latest_versions.png
b/content/assets/images/blog/hoodie-cleaner/Retain_latest_versions.png
new file mode 100644
index 0000000..791c496
Binary files /dev/null and
b/content/assets/images/blog/hoodie-cleaner/Retain_latest_versions.png differ
diff --git a/content/assets/js/lunr/lunr-store.js
b/content/assets/js/lunr/lunr-store.js
index 2084038..f2f9981 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1698,4 +1698,9 @@ var store = [{
"excerpt":"Apache Hudi is a data lake platform technology that
provides several functionalities needed to build and manage data lakes. One
such key feature that hudi provides is self-managing file sizing so that users
don’t need to worry about manual table maintenance. Having a lot of small files
will make it...","categories": ["blog"],
"tags": [],
"url": "https://hudi.apache.org/blog/hudi-file-sizing/",
+ "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
+ "title": "Employing correct configurations for Hudi's cleaner table
service",
+ "excerpt":"Apache Hudi provides snapshot isolation between writers and
readers. This is made possible by Hudi’s MVCC concurrency model. In this blog,
we will explain how to employ the right configurations to manage multiple file
versions. Furthermore, we will discuss mechanisms available to users on how to
maintain just the required...","categories": ["blog"],
+ "tags": [],
+ "url":
"https://hudi.apache.org/blog/employing-right-configurations-for-hudi-cleaner/",
"teaser":"https://hudi.apache.org/assets/images/500x300.png"},]
diff --git a/content/blog.html b/content/blog.html
index 8f9e796..3b4b8fe 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -193,6 +193,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/employing-right-configurations-for-hudi-cleaner/"
rel="permalink">Employing correct configurations for Hudi’s cleaner table
service
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a
href="https://cwiki.apache.org/confluence/display/~pratyakshsharma">Pratyaksh
Sharma</a> posted on <time datetime="2021-06-10">June 10, 2021</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">Ensuring isolation
between Hudi writers and readers using HoodieCleaner.java
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-file-sizing/" rel="permalink">Streaming
Responsibly - How Apache Hudi maintains optimum sized files
</a>
diff --git
a/content/blog/employing-right-configurations-for-hudi-cleaner/index.html
b/content/blog/employing-right-configurations-for-hudi-cleaner/index.html
new file mode 100644
index 0000000..1482db9
--- /dev/null
+++ b/content/blog/employing-right-configurations-for-hudi-cleaner/index.html
@@ -0,0 +1,358 @@
+<!doctype html>
+<html lang="en" class="no-js">
+ <head>
+ <meta charset="utf-8">
+
+<!-- begin _includes/seo.html --><title>Employing correct configurations for
Hudi’s cleaner table service - Apache Hudi</title>
+<meta name="description" content="Ensuring isolation between Hudi writers and
readers using HoodieCleaner.java">
+
+<meta property="og:type" content="article">
+<meta property="og:locale" content="en_US">
+<meta property="og:site_name" content="">
+<meta property="og:title" content="Employing correct configurations for Hudi’s
cleaner table service">
+<meta property="og:url"
content="https://hudi.apache.org/blog/employing-right-configurations-for-hudi-cleaner/">
+
+
+ <meta property="og:description" content="Ensuring isolation between Hudi
writers and readers using HoodieCleaner.java">
+
+
+
+
+
+
+
+
+
+
+
+<!-- end _includes/seo.html -->
+
+
+<!--<link href="/feed.xml" type="application/atom+xml" rel="alternate" title="
Feed">-->
+
+<!-- https://t.co/dKP3o1e -->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+<script>
+ document.documentElement.className =
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+</script>
+
+<!-- For all browsers -->
+<link rel="stylesheet" href="/assets/css/main.css">
+
+<!--[if IE]>
+ <style>
+ /* old IE unsupported flexbox fixes */
+ .greedy-nav .site-title {
+ padding-right: 3em;
+ }
+ .greedy-nav button {
+ position: absolute;
+ top: 0;
+ right: 0;
+ height: 100%;
+ }
+ </style>
+<![endif]-->
+
+
+
+<link rel="icon" type="image/x-icon" href="/assets/images/favicon.ico">
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+<script src="/assets/js/jquery.min.js"></script>
+
+
+<script src="/assets/js/main.min.js"></script>
+
+ </head>
+
+ <body class="layout--single">
+ <!--[if lt IE 9]>
+<div class="notice--danger align-center" style="margin: 0;">You are using an
<strong>outdated</strong> browser. Please <a
href="https://browsehappy.com/">upgrade your browser</a> to improve your
experience.</div>
+<![endif]-->
+
+ <div class="masthead">
+ <div class="masthead__inner-wrap" id="masthead__inner-wrap">
+ <div class="masthead__menu">
+ <nav id="site-nav" class="greedy-nav">
+
+ <a class="site-logo" href="/">
+ <div style="width: 150px; height: 40px">
+ </div>
+ </a>
+
+ <a class="site-title" href="/">
+
+ </a>
+ <ul class="visible-links"><li class="masthead__menu-item">
+ <a href="/docs/spark_quick-start-guide.html" target="_self"
>Documentation</a>
+ </li><li class="masthead__menu-item">
+ <a href="/community.html" target="_self" >Community</a>
+ </li><li class="masthead__menu-item">
+ <a href="/blog.html" target="_self" >Blog</a>
+ </li><li class="masthead__menu-item">
+ <a href="https://cwiki.apache.org/confluence/display/HUDI/FAQ"
target="_blank" >FAQ</a>
+ </li><li class="masthead__menu-item">
+ <a href="/docs/powered_by.html" target="_self" >Powered By</a>
+ </li><li class="masthead__menu-item">
+ <a href="/releases.html" target="_self" >Releases</a>
+ </li><li class="masthead__menu-item">
+ <a href="/download.html" target="_self" >Download</a>
+ </li></ul>
+ <button class="greedy-nav__toggle hidden" type="button">
+ <span class="visually-hidden">Toggle menu</span>
+ <div class="navicon"></div>
+ </button>
+ <ul class="hidden-links hidden"></ul>
+ </nav>
+ </div>
+ </div>
+</div>
+<!--
+<p class="notice--warning" style="margin: 0 !important; text-align: center
!important;"><strong>Note:</strong> This site is work in progress, if you
notice any issues, please <a target="_blank"
href="https://github.com/apache/hudi/issues">Report on Issue</a>.
+ Click <a href="/"> here</a> back to old site.</p>
+-->
+
+ <div class="initial-content">
+ <div id="main" role="main">
+
+
+ <div class="sidebar sticky">
+
+
+ <div itemscope itemtype="https://schema.org/Person">
+
+ <div class="author__content">
+
+ <h3 class="author__name" itemprop="name">Quick Links</h3>
+
+
+ <div class="author__bio" itemprop="description">
+ <p>Hudi <em>ingests</em> & <em>manages</em> storage of large
analytical datasets over DFS.</p>
+
+ </div>
+
+ </div>
+
+ <div class="author__urls-wrapper">
+ <ul class="author__urls social-icons">
+
+
+ <li><a href="/docs/spark_quick-start-guide" target="_self"
rel="nofollow noopener noreferrer"><i class="fa fa-book"
aria-hidden="true"></i> Documentation</a></li>
+
+
+
+ <li><a href="https://cwiki.apache.org/confluence/display/HUDI"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-wikipedia-w"
aria-hidden="true"></i> Technical Wiki</a></li>
+
+
+
+ <li><a href="/contributing" target="_self" rel="nofollow noopener
noreferrer"><i class="fa fa-thumbs-o-up" aria-hidden="true"></i> Contribution
Guide</a></li>
+
+
+
+ <li><a
href="https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-slack"
aria-hidden="true"></i> Join on Slack</a></li>
+
+
+
+ <li><a href="https://github.com/apache/hudi" target="_blank"
rel="nofollow noopener noreferrer"><i class="fa fa-github"
aria-hidden="true"></i> Fork on GitHub</a></li>
+
+
+
+ <li><a href="https://issues.apache.org/jira/projects/HUDI/summary"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-navicon"
aria-hidden="true"></i> Report Issues</a></li>
+
+
+
+ <li><a href="/security" target="_self" rel="nofollow noopener
noreferrer"><i class="fa fa-navicon" aria-hidden="true"></i> Report Security
Issues</a></li>
+
+
+
+
+ </ul>
+ </div>
+</div>
+
+
+
+
+ </div>
+
+
+ <article class="page" itemscope itemtype="https://schema.org/CreativeWork">
+ <!-- Look the author details up from the site config. -->
+
+
+ <div class="page__inner-wrap">
+
+ <header>
+ <h1 id="page-title" class="page__title"
itemprop="headline">Employing correct configurations for Hudi’s cleaner table
service
+</h1>
+ <!-- Output author details if some exist. -->
+ <div class="page__author"><a
href="https://cwiki.apache.org/confluence/display/~pratyakshsharma">Pratyaksh
Sharma</a> posted on <time datetime="2021-06-10">June 10, 2021</time></span>
+ </header>
+
+
+ <section class="page__content" itemprop="text">
+
+ <style>
+ .page {
+ padding-right: 0 !important;
+ }
+ </style>
+
+ <p>Apache Hudi provides snapshot isolation between writers and
readers. This is made possible by Hudi’s MVCC concurrency model. In this blog,
we will explain how to employ the right configurations to manage multiple file
versions. Furthermore, we will discuss mechanisms available to users on how to
maintain just the required number of old file versions so that long running
readers do not fail.</p>
+
+<h3
id="reclaiming-space-and-keeping-your-data-lake-storage-costs-in-check">Reclaiming
space and keeping your data lake storage costs in check</h3>
+
+<p>Hudi provides different table management services to be able to manage your
tables on the data lake. One of these services is called the
<strong>Cleaner</strong>. As you write more data to your table, for every batch
of updates received, Hudi can either generate a new version of the data file
with updates applied to records (COPY_ON_WRITE) or write these delta updates to
a log file, avoiding rewriting newer version of an existing file
(MERGE_ON_READ). In such situations, depending on [...]
+
+<h3 id="problem-statement">Problem Statement</h3>
+
+<p>In a data lake architecture, it is a very common scenario to have readers
and writers concurrently accessing the same table. As the Hudi cleaner service
periodically reclaims older file versions, scenarios arise where a long running
query might be accessing a file version that is deemed to be reclaimed by the
cleaner. Here, we need to employ the correct configs to ensure readers (aka
queries) don’t fail.</p>
+
+<h3 id="deeper-dive-into-hudi-cleaner">Deeper dive into Hudi Cleaner</h3>
+
+<p>To deal with the mentioned scenario, lets understand the different
cleaning policies that Hudi offers and the corresponding properties that need
to be configured. Options are available to schedule cleaning asynchronously or
synchronously. Before going into more details, we would like to explain a few
underlying concepts:</p>
+
+<ul>
+ <li><strong>Hudi base file</strong>: Columnar file which consists of final
data after compaction. A base file’s name follows the following naming
convention: <code
class="highlighter-rouge"><fileId>_<writeToken>_<instantTime>.parquet</code>.
In subsequent writes of this file, file id remains the same and commit time
gets updated to show the latest version. This also implies any particular
version of a record, given its partition path, can be uniquely located using
the [...]
+ <li><strong>File slice</strong>: A file slice consists of the base file and
any log files consisting of the delta, in case of MERGE_ON_READ table type.</li>
+ <li><strong>Hudi File Group</strong>: Any file group in Hudi is uniquely
identified by the partition path and the file id that the files in this group
have as part of their name. A file group consists of all the file slices in a
particular partition path. Also any partition path can have multiple file
groups.</li>
+</ul>
+
+<h3 id="cleaning-policies">Cleaning Policies</h3>
+
+<p>Hudi cleaner currently supports below cleaning policies:</p>
+
+<ul>
+ <li><strong>KEEP_LATEST_COMMITS</strong>: This is the default policy. This
is a temporal cleaning policy that ensures the effect of having lookback into
all the changes that happened in the last X commits. Suppose a writer is
ingesting data into a Hudi dataset every 30 minutes and the longest running
query can take 5 hours to finish, then the user should retain atleast the last
10 commits. With such a configuration, we ensure that the oldest version of a
file is kept on disk for at le [...]
+ <li><strong>KEEP_LATEST_FILE_VERSIONS</strong>: This policy has the effect
of keeping N number of file versions irrespective of time. This policy is
useful when it is known how many MAX versions of the file does one want to keep
at any given time. To achieve the same behaviour as before of preventing long
running queries from failing, one should do their calculations based on data
patterns. Alternatively, this policy is also useful if a user just wants to
maintain 1 latest version of t [...]
+</ul>
+
+<h3 id="examples">Examples</h3>
+
+<p>Suppose a user is ingesting data into a hudi dataset of type COPY_ON_WRITE
every 30 minutes as shown below:</p>
+
+<p><img src="/assets/images/blog/hoodie-cleaner/Initial_timeline.png"
alt="Initial timeline" />
+<em>Figure1: Incoming records getting ingested into a hudi dataset every 30
minutes</em></p>
+
+<p>The figure shows a particular partition on DFS where commits and
corresponding file versions are color coded. 4 different file groups are
created in this partition as depicted by fileGroup1, fileGroup2, fileGroup3 and
fileGroup4. File group corresponding to fileGroup2 has records ingested from
all the 5 commits, while the group corresponding to fileGroup4 has records from
the latest 2 commits only.</p>
+
+<p>Suppose the user uses the below configs for cleaning:</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="n">hoodie</span><span
class="o">.</span><span class="na">cleaner</span><span class="o">.</span><span
class="na">policy</span><span class="o">=</span><span
class="no">KEEP_LATEST_COMMITS</span>
+<span class="n">hoodie</span><span class="o">.</span><span
class="na">cleaner</span><span class="o">.</span><span
class="na">commits</span><span class="o">.</span><span
class="na">retained</span><span class="o">=</span><span class="mi">2</span>
+</code></pre></div></div>
+
+<p>Cleaner selects the versions of files to be cleaned by taking care of the
following:</p>
+
+<ul>
+ <li>Latest version of a file should not be cleaned.</li>
+ <li>The commit times of the last 2 (configured) + 1 commits are determined.
In Figure1, <code class="highlighter-rouge">commit 10:30</code> and <code
class="highlighter-rouge">commit 10:00</code> correspond to the latest 2
commits in the timeline. One extra commit is included because the time window
for retaining commits is essentially equal to the longest query run time. So if
the longest query takes 1 hour to finish, and ingestion happens every 30
minutes, you need to retain last 2 c [...]
+ <li>Now for any file group, only those file slices are scheduled for
cleaning which are not savepointed (another Hudi table service) and whose
commit time is less than the 3rd commit (<code class="highlighter-rouge">commit
9:30</code> in figure below) in reverse order.</li>
+</ul>
+
+<p><img src="/assets/images/blog/hoodie-cleaner/Retain_latest_commits.png"
alt="Retain latest commits" />
+<em>Figure2: Files corresponding to latest 3 commits are retained</em></p>
+
+<p>Now, suppose the user uses the below configs for cleaning:</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="n">hoodie</span><span
class="o">.</span><span class="na">cleaner</span><span class="o">.</span><span
class="na">policy</span><span class="o">=</span><span
class="no">KEEP_LATEST_FILE_VERSIONS</span>
+<span class="n">hoodie</span><span class="o">.</span><span
class="na">cleaner</span><span class="o">.</span><span
class="na">fileversions</span><span class="o">.</span><span
class="na">retained</span><span class="o">=</span><span class="mi">1</span>
+</code></pre></div></div>
+
+<p>Cleaner does the following:</p>
+
+<ul>
+ <li>For any file group, latest version (including any for pending
compaction) of file slices are kept and the rest are scheduled for cleaning.
Clearly as shown in Figure3, if clean action is triggered right after <code
class="highlighter-rouge">commit 10:30</code>, the cleaner will simply leave
the latest version in every file group and delete the rest.</li>
+</ul>
+
+<p><img src="/assets/images/blog/hoodie-cleaner/Retain_latest_versions.png"
alt="Retain latest versions" />
+<em>Figure3: Latest file version in every file group is retained</em></p>
+
+<h3 id="configurations">Configurations</h3>
+
+<p>You can find the details about all the possible configurations along with
the default values <a
href="https://hudi.apache.org/docs/configurations.html#compaction-configs">here</a>.</p>
+
+<h3 id="run-command">Run command</h3>
+
+<p>Hudi’s cleaner table service can be run as a separate process or along with
your data ingestion. As mentioned earlier, it basically cleans up any stale/old
files lying around. In case you want to run it along with ingesting data,
configs are available which enable you to run it <a
href="https://hudi.apache.org/docs/configurations.html#withAsyncClean">synchronously
or asynchronously</a>. You can use the below command for running the cleaner
independently:</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="o">[</span><span
class="n">hoodie</span><span class="o">]</span><span class="err">$</span> <span
class="n">spark</span><span class="o">-</span><span class="n">submit</span>
<span class="o">--</span><span class="kd">class</span> <span
class="nc">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">hudi</span><span class="o">.</sp [...]
+ <span class="o">--</span><span class="n">props</span> <span
class="nl">s3:</span><span
class="c1">///temp/hudi-ingestion-config/kafka-source.properties \</span>
+ <span class="o">--</span><span class="n">target</span><span
class="o">-</span><span class="n">base</span><span class="o">-</span><span
class="n">path</span> <span class="nl">s3:</span><span class="c1">///temp/hudi
\</span>
+ <span class="o">--</span><span class="n">spark</span><span
class="o">-</span><span class="n">master</span> <span
class="n">yarn</span><span class="o">-</span><span class="n">cluster</span>
+</code></pre></div></div>
+
+<p>In case you wish to run the cleaner service asynchronously with writing,
please configure the below:</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="n">hoodie</span><span
class="o">.</span><span class="na">clean</span><span class="o">.</span><span
class="na">automatic</span><span class="o">=</span><span class="kc">true</span>
+<span class="n">hoodie</span><span class="o">.</span><span
class="na">clean</span><span class="o">.</span><span
class="na">async</span><span class="o">=</span><span class="kc">true</span>
+</code></pre></div></div>
+
+<p>Further you can use <a
href="https://hudi.apache.org/docs/deployment.html#cli">Hudi CLI</a> for
managing your Hudi dataset. CLI provides the below commands for cleaner
service:</p>
+
+<ul>
+ <li><code class="highlighter-rouge">cleans show</code></li>
+ <li><code class="highlighter-rouge">clean showpartitions</code></li>
+ <li><code class="highlighter-rouge">cleans run</code></li>
+</ul>
+
+<p>You can find more details and the relevant code for these commands in <a
href="https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java"><code
class="highlighter-rouge">org.apache.hudi.cli.commands.CleansCommand</code>
class</a>.</p>
+
+<h3 id="future-scope">Future Scope</h3>
+
+<p>Work is currently going on for introducing a new cleaning policy based on
time elapsed. This will help in achieving a consistent retention throughout
regardless of how frequently ingestion happens. You may track the progress <a
href="https://issues.apache.org/jira/browse/HUDI-349">here</a>.</p>
+
+<p>We hope this blog gives you an idea about how to configure the Hudi cleaner
and the supported cleaning policies. Please visit the <a
href="https://hudi.apache.org/blog.html">blog section</a> for a deeper
understanding of various Hudi concepts. Cheers!</p>
+
+ </section>
+
+ <a href="#masthead__inner-wrap" class="back-to-top">Back to top
↑</a>
+
+
+
+
+ </div>
+
+ </article>
+
+</div>
+
+ </div>
+
+ <div class="page__footer">
+ <footer>
+
+<div class="row">
+ <div class="col-lg-12 footer">
+ <p>
+ <table class="table-apache-info">
+ <tr>
+ <td>
+ <a class="footer-link-img" href="https://apache.org">
+ <img width="250px" src="/assets/images/asf_logo.svg" alt="The
Apache Software Foundation">
+ </a>
+ </td>
+ <td>
+ <a style="float: right"
href="https://www.apache.org/events/current-event.html">
+ <img
src="https://www.apache.org/events/current-event-234x60.png" />
+ </a>
+ </td>
+ </tr>
+ </table>
+ </p>
+ <p>
+ <a href="https://www.apache.org/licenses/">License</a> | <a
href="https://www.apache.org/security/">Security</a> | <a
href="https://www.apache.org/foundation/thanks.html">Thanks</a> | <a
href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
+ </p>
+ <p>
+ Copyright © <span id="copyright-year">2019</span> <a
href="https://apache.org">The Apache Software Foundation</a>, Licensed under
the <a href="https://www.apache.org/licenses/LICENSE-2.0"> Apache License,
Version 2.0</a>.
+ Hudi, Apache and the Apache feather logo are trademarks of The Apache
Software Foundation. <a href="/docs/privacy">Privacy Policy</a>
+ </p>
+ </div>
+</div>
+ </footer>
+ </div>
+
+
+ </body>
+</html>
\ No newline at end of file
diff --git a/content/cn/activity.html b/content/cn/activity.html
index f678eca..dce906a 100644
--- a/content/cn/activity.html
+++ b/content/cn/activity.html
@@ -193,6 +193,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/employing-right-configurations-for-hudi-cleaner/"
rel="permalink">Employing correct configurations for Hudi’s cleaner table
service
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a
href="https://cwiki.apache.org/confluence/display/~pratyakshsharma">Pratyaksh
Sharma</a> posted on <time datetime="2021-06-10">June 10, 2021</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">Ensuring isolation
between Hudi writers and readers using HoodieCleaner.java
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-file-sizing/" rel="permalink">Streaming
Responsibly - How Apache Hudi maintains optimum sized files
</a>
diff --git a/content/sitemap.xml b/content/sitemap.xml
index ded240e..5890ce9 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -1361,6 +1361,10 @@
<lastmod>2021-03-01T00:00:00-05:00</lastmod>
</url>
<url>
+<loc>https://hudi.apache.org/blog/employing-right-configurations-for-hudi-cleaner/</loc>
+<lastmod>2021-06-10T00:00:00-04:00</lastmod>
+</url>
+<url>
<loc>https://hudi.apache.org/cn/activity</loc>
<lastmod>2019-12-30T14:59:57-05:00</lastmod>
</url>