[hudi] branch asf-site updated: [HUDI-4566] Document configuration updates (#6381)

codope Tue, 16 Aug 2022 02:33:25 -0700

This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new dbe2d915be [HUDI-4566] Document configuration updates (#6381)
dbe2d915be is described below

commit dbe2d915be4079e3a2286527267a747539ec77ef
Author: Sagar Sumit <[email protected]>
AuthorDate: Tue Aug 16 15:02:10 2022 +0530

    [HUDI-4566] Document configuration updates (#6381)
---
 hudi-utils/README.md                         |    8 +-
 hudi-utils/{README.md => generate_config.sh} |   52 +-
 hudi-utils/pom.xml                           |    2 +-
 website/docs/basic_configurations.md         |  286 +++--
 website/docs/configurations.md               | 1661 +++++++++++++++-----------
 website/docs/writing_data.md                 |    2 +-
 6 files changed, 1226 insertions(+), 785 deletions(-)

diff --git a/hudi-utils/README.md b/hudi-utils/README.md
index 34f28067c5..ce39aea016 100644
--- a/hudi-utils/README.md
+++ b/hudi-utils/README.md
@@ -23,9 +23,12 @@ If not, execute `mvn install -DskipTests` in your hudi repo  
<br/>
 mvn clean
 mvn install
 ```
-Set the appropriate SNAPSHOT version and execute the below commands
+
+When new config classes are added, or existing ones are moved to a separate 
module, 
+please add the corresponding bundle for configurations of that module to be 
picked up.
+Set the appropriate SNAPSHOT version and execute the below commands. Below 
commands are in the [generate_configs.sh](generate_config.sh) script.
 ```shell
-VERSION=0.12.0
+VERSION=0.13.0
 
 JARS=(
 
"$HOME/.m2/repository/org/apache/hudi/hudi-utilities-bundle_2.11/$VERSION-SNAPSHOT/hudi-utilities-bundle_2.11-$VERSION-SNAPSHOT.jar"
@@ -34,6 +37,7 @@ JARS=(
 
"$HOME/.m2/repository/org/apache/hudi/hudi-kafka-connect-bundle/$VERSION-SNAPSHOT/hudi-kafka-connect-bundle-$VERSION-SNAPSHOT.jar"
 
"$HOME/.m2/repository/org/apache/hudi/hudi-datahub-sync-bundle/$VERSION-SNAPSHOT/hudi-datahub-sync-bundle-$VERSION-SNAPSHOT.jar"
 
"$HOME/.m2/repository/org/apache/hudi/hudi-gcp-bundle/$VERSION-SNAPSHOT/hudi-gcp-bundle-$VERSION-SNAPSHOT.jar"
+"$HOME/.m2/repository/org/apache/hudi/hudi-aws-bundle/$VERSION-SNAPSHOT/hudi-aws-bundle-$VERSION-SNAPSHOT.jar"
 )
 
 printf -v CLASSPATH ':%s' "${JARS[@]}"
diff --git a/hudi-utils/README.md b/hudi-utils/generate_config.sh
old mode 100644
new mode 100755
similarity index 50%
copy from hudi-utils/README.md
copy to hudi-utils/generate_config.sh
index 34f28067c5..89900de8b0
--- a/hudi-utils/README.md
+++ b/hudi-utils/generate_config.sh
@@ -1,31 +1,23 @@
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-
-Execute these from hudi-utils dir <br/>
-Ensure you have hudi artifacts from latest master installed <br/> 
-If not, execute `mvn install -DskipTests` in your hudi repo  <br/>
-
-```shell
-mvn clean
-mvn install
-```
-Set the appropriate SNAPSHOT version and execute the below commands
-```shell
-VERSION=0.12.0
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+VERSION=0.13.0
 
 JARS=(
 
"$HOME/.m2/repository/org/apache/hudi/hudi-utilities-bundle_2.11/$VERSION-SNAPSHOT/hudi-utilities-bundle_2.11-$VERSION-SNAPSHOT.jar"
@@ -34,6 +26,7 @@ JARS=(
 
"$HOME/.m2/repository/org/apache/hudi/hudi-kafka-connect-bundle/$VERSION-SNAPSHOT/hudi-kafka-connect-bundle-$VERSION-SNAPSHOT.jar"
 
"$HOME/.m2/repository/org/apache/hudi/hudi-datahub-sync-bundle/$VERSION-SNAPSHOT/hudi-datahub-sync-bundle-$VERSION-SNAPSHOT.jar"
 
"$HOME/.m2/repository/org/apache/hudi/hudi-gcp-bundle/$VERSION-SNAPSHOT/hudi-gcp-bundle-$VERSION-SNAPSHOT.jar"
+"$HOME/.m2/repository/org/apache/hudi/hudi-aws-bundle/$VERSION-SNAPSHOT/hudi-aws-bundle-$VERSION-SNAPSHOT.jar"
 )
 
 printf -v CLASSPATH ':%s' "${JARS[@]}"
@@ -43,6 +36,3 @@ java -cp 
target/hudi-utils-1.0-SNAPSHOT-jar-with-dependencies.jar$CLASSPATH \
 org.apache.hudi.utils.HoodieConfigDocGenerator
 
 cp /tmp/configurations.md ../website/docs/configurations.md
-```
-
-Once complete, please put up a patch with latest configurations.
\ No newline at end of file
diff --git a/hudi-utils/pom.xml b/hudi-utils/pom.xml
index cfb3dbf836..91a15ff612 100644
--- a/hudi-utils/pom.xml
+++ b/hudi-utils/pom.xml
@@ -27,7 +27,7 @@
 
     <properties>
         <jdk.version>1.8</jdk.version>
-        <hudi.version>0.12.0-SNAPSHOT</hudi.version>
+        <hudi.version>0.13.0-SNAPSHOT</hudi.version>
         <hudi.spark.module>hudi-spark2</hudi.spark.module>
         <scala.binary.version>2.11</scala.binary.version>
         <junit.version>4.11</junit.version>
diff --git a/website/docs/basic_configurations.md 
b/website/docs/basic_configurations.md
index 09c6259616..8230e2a2fc 100644
--- a/website/docs/basic_configurations.md
+++ b/website/docs/basic_configurations.md
@@ -145,8 +145,8 @@ By default false (the names of partition folders are only 
partition values)<br><
 ---
 
 > #### hoodie.datasource.hive_sync.partition_extractor_class
-> Class which implements PartitionValueExtractor to extract the partition 
values, default 'SlashEncodedDayPartitionValueExtractor'.<br></br>
-> **Default Value**: 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> Class which implements PartitionValueExtractor to extract the partition 
values, default 'MultiPartKeysValueExtractor'.<br></br>
+> **Default Value**: org.apache.hudi.hive.MultiPartKeysValueExtractor 
(Optional)<br></br>
 > `Config Param: HIVE_PARTITION_EXTRACTOR_CLASS`<br></br>
 
 ---
@@ -428,19 +428,32 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 ### Compaction Configs {#Compaction-Configs}
 
-Configurations that control compaction (merging of log files onto new base 
files) as well as cleaning (reclamation of older/unused file groups/slices).
+Configurations that control compaction (merging of log files onto a new base 
files).
 
 `Config Class`: org.apache.hudi.config.HoodieCompactionConfig<br></br>
+> #### hoodie.compaction.lazy.block.read
+> When merging the delta log files, this config helps to choose whether the 
log blocks should be read lazily or not. Choose true to use lazy block reading 
(low memory usage, but incurs seeks to each block header) or false for 
immediate block read (higher memory usage)<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMPACTION_LAZY_BLOCK_READ_ENABLE`<br></br>
 
-> #### hoodie.cleaner.policy
-> Cleaning policy to be used. The cleaner service deletes older file slices 
files to re-claim space. By default, cleaner spares the file slices written by 
the last N commits, determined by hoodie.cleaner.commits.retained Long running 
query plans may often refer to older file slices and will break if those are 
cleaned, before the query has had a chance to run. So, it is good to make sure 
that the data is retained for more than the maximum query execution 
time<br></br>
-> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
-> `Config Param: CLEANER_POLICY`<br></br>
+---
+
+> #### hoodie.parquet.small.file.limit
+> During upsert operation, we opportunistically expand existing small files on 
storage, instead of writing new files, to keep number of files to an optimum. 
This config sets the file size limit below which a file on storage  becomes a 
candidate to be selected as such a `small file`. By default, treat any file <= 
100MB as a small file. Also note that if this set <= 0, will not try to get 
small files and directly write new files<br></br>
+> **Default Value**: 104857600 (Optional)<br></br>
+> `Config Param: PARQUET_SMALL_FILE_LIMIT`<br></br>
+
+---
+
+> #### hoodie.compaction.strategy
+> Compaction strategy decides which file groups are picked up for compaction 
during each compaction run. By default. Hudi picks the log file with most 
accumulated unmerged data<br></br>
+> **Default Value**: 
org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy
 (Optional)<br></br>
+> `Config Param: COMPACTION_STRATEGY`<br></br>
 
 ---
 
 > #### hoodie.copyonwrite.record.size.estimate
-> The average record size. If not explicitly specified, hudi will compute the 
record size estimate dynamically based on commit metadata. This is critical in 
computing the insert parallelism and bin-packing inserts into small 
files.<br></br>
+> The average record size. If not explicitly specified, hudi will compute the 
record size estimate compute dynamically based on commit metadata.  This is 
critical in computing the insert parallelism and bin-packing inserts into small 
files.<br></br>
 > **Default Value**: 1024 (Optional)<br></br>
 > `Config Param: COPY_ON_WRITE_RECORD_SIZE_ESTIMATE`<br></br>
 
@@ -453,80 +466,122 @@ Configurations that control compaction (merging of log 
files onto new base files
 
 ---
 
-> #### hoodie.cleaner.commits.retained
-> Number of commit to retain when cleaner is triggered with 
KEEP_LATEST_COMMITS cleaning policy. Make sure to configure this property 
properly so that the longest running query is able to succeed. This also 
directly translates into how much data retention the table supports for 
incremental queries.
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+> #### hoodie.compaction.target.io
+> Amount of MBs to spend during compaction run for the 
LogFileSizeBasedCompactionStrategy. This value helps bound ingestion latency 
while compaction is run inline mode.<br></br>
+> **Default Value**: 512000 (Optional)<br></br>
+> `Config Param: TARGET_IO_PER_COMPACTION_IN_MB`<br></br>
 
 ---
 
-> #### hoodie.clean.async
-> Only applies when hoodie.clean.automatic is turned on. When turned on runs 
cleaner async with writing, which can speed up overall write 
performance.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: ASYNC_CLEAN`<br></br>
+> #### hoodie.compaction.logfile.size.threshold
+> Only if the log file size is greater than the threshold in bytes, the file 
group will be compacted.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: COMPACTION_LOG_FILE_SIZE_THRESHOLD`<br></br>
 
 ---
 
-> #### hoodie.clean.automatic
-> When enabled, the cleaner table service is invoked immediately after each 
commit, to delete older file slices. It's recommended to enable this, to ensure 
metadata and data storage growth is bounded.<br></br>
+> #### hoodie.compaction.preserve.commit.metadata
+> When rewriting data, preserves existing hoodie_commit_time<br></br>
 > **Default Value**: true (Optional)<br></br>
-> `Config Param: AUTO_CLEAN`<br></br>
+> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
+> `Since Version: 0.11.0`<br></br>
 
 ---
 
-> #### hoodie.commits.archival.batch
-> Archiving of instants is batched in best-effort manner, to pack more 
instants into a single archive log. This config controls such archival batch 
size.<br></br>
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
+> #### hoodie.copyonwrite.insert.auto.split
+> Config to control whether we control insert split sizes automatically based 
on average record sizes. It's recommended to keep this turned on, since hand 
tuning is otherwise extremely cumbersome.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_AUTO_SPLIT_INSERTS`<br></br>
 
 ---
 
-> #### hoodie.compact.inline
-> When set to true, compaction service is triggered after each write. While 
being simpler operationally, this adds extra latency on the write path.<br></br>
+> #### hoodie.compact.inline.max.delta.commits
+> Number of delta commits after the last compaction, before scheduling of a 
new compaction is attempted.<br></br>
+> **Default Value**: 5 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_NUM_DELTA_COMMITS`<br></br>
+
+---
+
+> #### hoodie.record.size.estimation.threshold
+> We use the previous commits' metadata to calculate the estimated record size 
and use it  to bin pack records into partitions. If the previous commit is too 
small to make an accurate estimation,  Hudi will search commits in the reverse 
order, until we find a commit that has totalBytesWritten  larger than 
(PARQUET_SMALL_FILE_LIMIT_BYTES * this_threshold)<br></br>
+> **Default Value**: 1.0 (Optional)<br></br>
+> `Config Param: RECORD_SIZE_ESTIMATION_THRESHOLD`<br></br>
+
+---
+
+> #### hoodie.compact.inline.trigger.strategy
+> Controls how compaction scheduling is triggered, by time or num delta 
commits or combination of both. Valid options: 
NUM_COMMITS,NUM_COMMITS_AFTER_LAST_REQUEST,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
+> **Default Value**: NUM_COMMITS (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.compaction.reverse.log.read
+> HoodieLogFormatReader reads a logfile in the forward direction starting from 
pos=0 to pos=file_length. If this config is set to true, the reader reads the 
logfile in reverse direction, from pos=file_length to pos=0<br></br>
 > **Default Value**: false (Optional)<br></br>
-> `Config Param: INLINE_COMPACT`<br></br>
+> `Config Param: COMPACTION_REVERSE_LOG_READ_ENABLE`<br></br>
 
 ---
 
-> #### hoodie.parquet.small.file.limit
-> During upsert operation, we opportunistically expand existing small files on 
storage, instead of writing new files, to keep number of files to an optimum. 
This config sets the file size limit below which a file on storage becomes a 
candidate to be selected as such a `small file`. By default, treat any file <= 
100MB as a small file.<br></br>
-> **Default Value**: 104857600 (Optional)<br></br>
-> `Config Param: PARQUET_SMALL_FILE_LIMIT`<br></br>
+> #### hoodie.copyonwrite.insert.split.size
+> Number of inserts assigned for each partition/bucket for writing. We based 
the default on writing out 100MB files, with at least 1kb records (100K records 
per file), and   over provision to 500K. As long as auto-tuning of splits is 
turned on, this only affects the first   write, where there is no history to 
learn record sizes from.<br></br>
+> **Default Value**: 500000 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_INSERT_SPLIT_SIZE`<br></br>
 
 ---
 
-> #### hoodie.compaction.strategy
-> Compaction strategy decides which file groups are picked up for compaction 
during each compaction run. By default, Hudi picks the log file with most 
accumulated unmerged data<br></br>
-> **Default Value**: 
org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy
 (Optional)<br></br>
-> `Config Param: COMPACTION_STRATEGY`<br></br>
+> #### hoodie.compact.schedule.inline
+> When set to true, compaction service will be attempted for inline scheduling 
after each write. Users have to ensure they have a separate job to run async 
compaction(execution) for the one scheduled by this writer. Users can choose to 
set both `hoodie.compact.inline` and `hoodie.compact.schedule.inline` to false 
and have both scheduling and execution triggered by any async process. But if 
`hoodie.compact.inline` is set to false, and `hoodie.compact.schedule.inline` 
is set to true, regul [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEDULE_INLINE_COMPACT`<br></br>
 
 ---
 
-> #### hoodie.archive.automatic
-> When enabled, the archival table service is invoked immediately after each 
commit, to archive commits if we cross a maximum value of commits. It's 
recommended to enable this, to ensure number of active commits is 
bounded.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: AUTO_ARCHIVE`<br></br>
+> #### hoodie.compaction.daybased.target.partitions
+> Used by org.apache.hudi.io.compact.strategy.DayBasedCompactionStrategy to 
denote the number of latest partitions to compact during a compaction 
run.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: TARGET_PARTITIONS_PER_DAYBASED_COMPACTION`<br></br>
 
 ---
 
-> #### hoodie.copyonwrite.insert.auto.split
-> Config to control whether we control insert split sizes automatically based 
on average record sizes. It's recommended to keep this turned on, since hand 
tuning is otherwise extremely cumbersome.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: COPY_ON_WRITE_AUTO_SPLIT_INSERTS`<br></br>
+> #### hoodie.compact.inline
+> When set to true, compaction service is triggered after each write. While 
being  simpler operationally, this adds extra latency on the write 
path.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_COMPACT`<br></br>
 
 ---
 
-> #### hoodie.compact.inline.max.delta.commits
-> Number of delta commits after the last compaction, before scheduling of a 
new compaction is attempted. This is used when the [compaction trigger 
strategy](/docs/configurations/#hoodiecompactinlinetriggerstrategy) involves 
number of commits. For example NUM_COMMITS,NUM_AND_TIME,NUM_OR_TIME <br></br>
-> **Default Value**: 5 (Optional)<br></br>
-> `Config Param: INLINE_COMPACT_NUM_DELTA_COMMITS`<br></br>
+### Clean Configs {#Clean-Configs}
+
+Cleaning (reclamation of older/unused file groups/slices).
+
+`Config Class`: org.apache.hudi.config.HoodieCleanConfig<br></br>
+> #### hoodie.cleaner.fileversions.retained
+> When KEEP_LATEST_FILE_VERSIONS cleaning policy is used,  the minimum number 
of file slices to retain in each file group, during cleaning.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CLEANER_FILE_VERSIONS_RETAINED`<br></br>
 
 ---
 
-> #### hoodie.keep.min.commits
-> Similar to hoodie.keep.max.commits, but controls the minimum number of 
instants to retain in the active timeline.<br></br>
-> **Default Value**: 20 (Optional)<br></br>
-> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
+> #### hoodie.clean.max.commits
+> Number of commits after the last clean operation, before scheduling of a new 
clean is attempted.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: CLEAN_MAX_COMMITS`<br></br>
+
+---
+
+> #### hoodie.clean.allow.multiple
+> Allows scheduling/executing multiple cleans by enabling this config. If 
users prefer to strictly ensure clean requests should be mutually exclusive, 
.i.e. a 2nd clean will not be scheduled if another clean is not yet completed 
to avoid repeat cleaning of same files, they might want to disable this 
config.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ALLOW_MULTIPLE_CLEANS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clean.automatic
+> When enabled, the cleaner table service is invoked immediately after each 
commit, to delete older file slices. It's recommended to enable this, to ensure 
metadata and data storage growth is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_CLEAN`<br></br>
 
 ---
 
@@ -537,51 +592,136 @@ Configurations that control compaction (merging of log 
files onto new base files
 
 ---
 
-> #### hoodie.record.size.estimation.threshold
-> We use the previous commits' metadata to calculate the estimated record size 
and use it to bin pack records into partitions. If the previous commit is too 
small to make an accurate estimation, Hudi will search commits in the reverse 
order, until we find a commit that has totalBytesWritten larger than 
(PARQUET_SMALL_FILE_LIMIT_BYTES * this_threshold)<br></br>
-> **Default Value**: 1.0 (Optional)<br></br>
-> `Config Param: RECORD_SIZE_ESTIMATION_THRESHOLD`<br></br>
+> #### hoodie.cleaner.incremental.mode
+> When enabled, the plans for each cleaner service run is computed 
incrementally off the events  in the timeline, since the last cleaner run. This 
is much more efficient than obtaining listings for the full table for each 
planning (even with a metadata table).<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: CLEANER_INCREMENTAL_MODE_ENABLE`<br></br>
 
 ---
 
-> #### hoodie.compact.inline.trigger.strategy
-> Controls how compaction scheduling is triggered, by time or num delta 
commits or combination of both. Valid options: 
NUM_COMMITS,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
+> #### hoodie.clean.async
+> Only applies when hoodie.clean.automatic is turned on. When turned on runs 
cleaner async with writing, which can speed up overall write 
performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLEAN`<br></br>
+
+---
+
+> #### hoodie.clean.trigger.strategy
+> Controls how cleaning is scheduled. Valid options: NUM_COMMITS<br></br>
 > **Default Value**: NUM_COMMITS (Optional)<br></br>
-> `Config Param: INLINE_COMPACT_TRIGGER_STRATEGY`<br></br>
+> `Config Param: CLEAN_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.cleaner.delete.bootstrap.base.file
+> When set to true, cleaner also deletes the bootstrap base file when it's 
skeleton base file is  cleaned. Turn this to true, if you want to ensure the 
bootstrap dataset storage is reclaimed over time, as the table receives 
updates/deletes. Another reason to turn this on, would be to ensure data 
residing in bootstrap  base files are also physically deleted, to comply with 
data privacy enforcement processes.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CLEANER_BOOTSTRAP_BASE_FILE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.cleaner.hours.retained
+> Number of hours for which commits need to be retained. This config provides 
a more flexible option ascompared to number of commits retained for cleaning 
service. Setting this property ensures all the files, but the latest in a file 
group, corresponding to commits with commit times older than the configured 
number of hours to be retained are cleaned.<br></br>
+> **Default Value**: 24 (Optional)<br></br>
+> `Config Param: CLEANER_HOURS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.cleaner.commits.retained
+> Number of commits to retain, without cleaning. This will be retained for 
num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much data retention the table supports for incremental 
queries.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy.failed.writes
+> Cleaning policy for failed writes to be used. Hudi will delete any files 
written by failed writes to re-claim space. Choose to perform this rollback of 
failed writes eagerly before every writer starts (only supported for single 
writer) or lazily by the cleaner (required for multi-writers)<br></br>
+> **Default Value**: EAGER (Optional)<br></br>
+> `Config Param: FAILED_WRITES_CLEANER_POLICY`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy
+> Cleaning policy to be used. The cleaner service deletes older file slices 
files to re-claim space. By default, cleaner spares the file slices written by 
the last N commits, determined by  hoodie.cleaner.commits.retained Long running 
query plans may often refer to older file slices and will break if those are 
cleaned, before the query has had   a chance to run. So, it is good to make 
sure that the data is retained for more than the maximum query execution 
time<br></br>
+> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
+> `Config Param: CLEANER_POLICY`<br></br>
+
+---
+
+### Archival Configs {#Archival-Configs}
+
+Configurations that control archival.
+
+`Config Class`: org.apache.hudi.config.HoodieArchivalConfig<br></br>
+> #### hoodie.archive.merge.small.file.limit.bytes
+> This config sets the archive file size limit below which an archive file 
becomes a candidate to be selected as such a small file.<br></br>
+> **Default Value**: 20971520 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES`<br></br>
 
 ---
 
 > #### hoodie.keep.max.commits
-> Archiving service moves older entries from timeline into an archived log 
after each write, to keep the metadata overhead constant, even as the table 
size grows.This config controls the maximum number of instants to retain in the 
active timeline. <br></br>
+> Archiving service moves older entries from timeline into an archived log 
after each write, to  keep the metadata overhead constant, even as the table 
size grows.This config controls the maximum number of instants to retain in the 
active timeline. <br></br>
 > **Default Value**: 30 (Optional)<br></br>
 > `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
 
 ---
 
-> #### hoodie.copyonwrite.insert.split.size
-> Number of inserts assigned for each partition/bucket for writing. We based 
the default on writing out 100MB files, with at least 1kb records (100K records 
per file), and over provision to 500K. As long as auto-tuning of splits is 
turned on, this only affects the first write, where there is no history to 
learn record sizes from.<br></br>
-> **Default Value**: 500000 (Optional)<br></br>
-> `Config Param: COPY_ON_WRITE_INSERT_SPLIT_SIZE`<br></br>
+> #### hoodie.archive.merge.enable
+> When enable, hoodie will auto merge several small archive files into larger 
one. It's useful when storage scheme doesn't support append operation.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_ENABLE`<br></br>
 
 ---
 
-### File System View Storage Configurations 
{#File-System-View-Storage-Configurations}
+> #### hoodie.archive.automatic
+> When enabled, the archival table service is invoked immediately after each 
commit, to archive commits if we cross a maximum value of commits. It's 
recommended to enable this, to ensure number of active commits is 
bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_ARCHIVE`<br></br>
 
-Configurations that control how file metadata is stored by Hudi, for 
transaction processing and queries.
+---
 
-`Config Class`: 
org.apache.hudi.common.table.view.FileSystemViewStorageConfig<br></br>
+> #### hoodie.archive.delete.parallelism
+> Parallelism for deleting archived hoodie commits.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: DELETE_ARCHIVED_INSTANT_PARALLELISM_VALUE`<br></br>
 
-> #### hoodie.filesystem.view.type
-> File system view provides APIs for viewing the files on the underlying lake 
storage, as file groups and file slices. This config controls how such a view 
is held. Options include 
MEMORY,SPILLABLE_DISK,EMBEDDED_KV_STORE,REMOTE_ONLY,REMOTE_FIRST which provide 
different trade offs for memory usage and API request performance.<br></br>
-> **Default Value**: MEMORY (Optional)<br></br>
-> `Config Param: VIEW_TYPE`<br></br>
+---
+
+> #### hoodie.archive.beyond.savepoint
+> If enabled, archival will proceed beyond savepoint, skipping savepoint 
commits. If disabled, archival will stop at the earliest savepoint 
commit.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_BEYOND_SAVEPOINT`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
+> #### hoodie.commits.archival.batch
+> Archiving of instants is batched in best-effort manner, to pack more 
instants into a single archive log. This config controls such archival batch 
size.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.archive.async
+> Only applies when hoodie.archive.automatic is turned on. When turned on runs 
archiver async with writing, which can speed up overall write 
performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_ARCHIVE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.keep.min.commits
+> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
 
 ---
 
-> #### hoodie.filesystem.view.secondary.type
-> Specifies the secondary form of storage for file system view, if the primary 
(e.g timeline server) is unavailable.<br></br>
-> **Default Value**: MEMORY (Optional)<br></br>
-> `Config Param: SECONDARY_VIEW_TYPE`<br></br>
+> #### hoodie.archive.merge.files.batch.size
+> The number of small archive files to be merged at once.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_FILES_BATCH_SIZE`<br></br>
 
 ---
 
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 6dcb76179b..edffeedfe3 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -4,7 +4,7 @@ keywords: [ configurations, default, flink options, spark, 
configs, parameters ]
 permalink: /docs/configurations.html
 summary: This page covers the different ways of configuring your job to 
write/read Hudi tables. At a high level, you can control behaviour at few 
levels.
 toc: true
-last_modified_at: 2022-04-30T18:29:54.348
+last_modified_at: 2022-08-12T13:18:38.885
 ---
 
 This page covers the different ways of configuring your job to write/read Hudi 
tables. At a high level, you can control behaviour at few levels.
@@ -112,6 +112,13 @@ Options useful for reading tables via 
`read.format.option(...)`
 
 ---
 
+> #### hoodie.schema.on.read.enable
+> Enables support for Schema Evolution feature<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEMA_EVOLUTION_ENABLED`<br></br>
+
+---
+
 > #### hoodie.datasource.read.begin.instanttime
 > Instant time to start incrementally pulling data from. The instanttime here 
 > need not necessarily correspond to an instant on the timeline. New data 
 > written with an instant_time > BEGIN_INSTANTTIME are fetched out. For e.g: 
 > ‘20170901080000’ will get all new data written after Sep 1, 2017 
 > 08:00AM.<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -199,8 +206,8 @@ the dot notation eg: `a.b.c`<br></br>
 ---
 
 > #### hoodie.datasource.hive_sync.partition_extractor_class
-> Class which implements PartitionValueExtractor to extract the partition 
values, default 'SlashEncodedDayPartitionValueExtractor'.<br></br>
-> **Default Value**: 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> Class which implements PartitionValueExtractor to extract the partition 
values, default 'org.apache.hudi.hive.MultiPartKeysValueExtractor'.<br></br>
+> **Default Value**: org.apache.hudi.hive.MultiPartKeysValueExtractor 
(Optional)<br></br>
 > `Config Param: HIVE_PARTITION_EXTRACTOR_CLASS`<br></br>
 
 ---
@@ -376,7 +383,7 @@ the dot notation eg: `a.b.c`<br></br>
 ---
 
 > #### hoodie.datasource.hive_sync.assume_date_partitioning
-> Assume partitioning is yyyy/mm/dd<br></br>
+> Assume partitioning is yyyy/MM/dd<br></br>
 > **Default Value**: false (Optional)<br></br>
 > `Config Param: HIVE_ASSUME_DATE_PARTITION`<br></br>
 
@@ -580,20 +587,6 @@ These configs control the Hudi Flink SQL source/sink 
connectors, providing abili
 Flink jobs using the SQL can be configured through the options in WITH clause. 
The actual datasource level configs are listed below.
 
 `Config Class`: org.apache.hudi.configuration.FlinkOptions<br></br>
-> #### read.streaming.enabled
-> Whether to read as streaming source, default false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: READ_AS_STREAMING`<br></br>
-
----
-
-> #### hoodie.datasource.write.keygenerator.type
-> Key generator type, that implements will extract the key out of incoming 
record<br></br>
-> **Default Value**: SIMPLE (Optional)<br></br>
-> `Config Param: KEYGEN_TYPE`<br></br>
-
----
-
 > #### compaction.trigger.strategy
 > Strategy to trigger compaction, options are 'num_commits': trigger 
 > compaction when reach N delta commits;
 'time_elapsed': trigger compaction when time elapsed &gt; N seconds since last 
compaction;
@@ -612,21 +605,6 @@ Default is 'num_commits'<br></br>
 
 ---
 
-> #### compaction.max_memory
-> Max memory in MB for compaction spillable map, default 100MB<br></br>
-> **Default Value**: 100 (Optional)<br></br>
-> `Config Param: COMPACTION_MAX_MEMORY`<br></br>
-
----
-
-> #### hive_sync.support_timestamp
-> INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp 
type.
-Disabled by default for backward compatibility.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: HIVE_SYNC_SUPPORT_TIMESTAMP`<br></br>
-
----
-
 > #### hive_sync.serde_properties
 > Serde properties to hive table, the data format is k1=v1
 k2=v2<br></br>
@@ -635,34 +613,6 @@ k2=v2<br></br>
 
 ---
 
-> #### hive_sync.skip_ro_suffix
-> Skip the _ro suffix for Read optimized table when registering, default 
false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: HIVE_SYNC_SKIP_RO_SUFFIX`<br></br>
-
----
-
-> #### metadata.compaction.delta_commits
-> Max delta commits for metadata table to trigger compaction, default 
10<br></br>
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: METADATA_COMPACTION_DELTA_COMMITS`<br></br>
-
----
-
-> #### hive_sync.assume_date_partitioning
-> Assume partitioning is yyyy/mm/dd, default false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: HIVE_SYNC_ASSUME_DATE_PARTITION`<br></br>
-
----
-
-> #### write.parquet.block.size
-> Parquet RowGroup size. It's recommended to make this large enough that scan 
costs can be amortized by packing enough column values into a single row 
group.<br></br>
-> **Default Value**: 120 (Optional)<br></br>
-> `Config Param: WRITE_PARQUET_BLOCK_SIZE`<br></br>
-
----
-
 > #### hive_sync.table
 > Table name for hive sync, default 'unknown'<br></br>
 > **Default Value**: unknown (Optional)<br></br>
@@ -729,31 +679,6 @@ By default false (the names of partition folders are only 
partition values)<br><
 
 ---
 
-> #### hive_sync.enable
-> Asynchronously sync Hive meta to HMS, default false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: HIVE_SYNC_ENABLED`<br></br>
-
----
-
-> #### changelog.enabled
-> Whether to keep all the intermediate changes, we try to keep all the changes 
of a record when enabled:
-1). The sink accept the UPDATE_BEFORE message;
-2). The source try to emit every changes of a record.
-The semantics is best effort because the compaction job would finally merge 
all changes of a record into one.
- default false to have UPSERT semantics<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: CHANGELOG_ENABLED`<br></br>
-
----
-
-> #### read.streaming.check-interval
-> Check interval for streaming read of SECOND, default 1 minute<br></br>
-> **Default Value**: 60 (Optional)<br></br>
-> `Config Param: READ_STREAMING_CHECK_INTERVAL`<br></br>
-
----
-
 > #### write.bulk_insert.shuffle_input
 > Whether to shuffle the inputs by specific fields for bulk insert tasks, 
 > default true<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -761,16 +686,6 @@ The semantics is best effort because the compaction job 
would finally merge all
 
 ---
 
-> #### hoodie.datasource.merge.type
-> For Snapshot query on merge on read table. Use this key to define how the 
payloads are merged, in
-1) skip_merge: read the base file records plus the log file records;
-2) payload_combine: read the base file records first, for each record in base 
file, checks whether the key is in the
-   log file records(combines the two records with same key for base and log 
file records), then read the left log file records<br></br>
-> **Default Value**: payload_combine (Optional)<br></br>
-> `Config Param: MERGE_TYPE`<br></br>
-
----
-
 > #### write.retry.times
 > Flag to indicate how many times streaming job should retry for a failed 
 > checkpoint batch.
 By default 3<br></br>
@@ -780,19 +695,12 @@ By default 3<br></br>
 ---
 
 > #### metadata.enabled
-> Enable the internal metadata table which serves table metadata like level 
file listings, default false<br></br>
+> Enable the internal metadata table which serves table metadata like level 
file listings, default disabled<br></br>
 > **Default Value**: false (Optional)<br></br>
 > `Config Param: METADATA_ENABLED`<br></br>
 
 ---
 
-> #### read.tasks
-> Parallelism of tasks that do actual read, default is 4<br></br>
-> **Default Value**: 4 (Optional)<br></br>
-> `Config Param: READ_TASKS`<br></br>
-
----
-
 > #### write.parquet.max.file.size
 > Target size for parquet files produced by Hudi write phases. For DFS, this 
 > needs to be aligned with the underlying filesystem block size for optimal 
 > performance.<br></br>
 > **Default Value**: 120 (Optional)<br></br>
@@ -800,6 +708,13 @@ By default 3<br></br>
 
 ---
 
+> #### clustering.plan.strategy.daybased.skipfromlatest.partitions
+> Number of partitions to skip from latest when choosing partitions to create 
ClusteringPlan<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST`<br></br>
+
+---
+
 > #### hoodie.bucket.index.hash.field
 > Index key field. Value to be used as hashing to find the bucket ID. Should 
 > be a subset of or equal to the recordKey fields.
 Actual value will be obtained by invoking .toString() on the field value. 
Nested fields can be specified using the dot notation eg: `a.b.c`<br></br>
@@ -815,27 +730,6 @@ Actual value will be obtained by invoking .toString() on 
the field value. Nested
 
 ---
 
-> #### read.end-commit
-> End commit instant for reading, the commit time format should be 
'yyyyMMddHHmmss'<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: READ_END_COMMIT`<br></br>
-
----
-
-> #### write.log.max.size
-> Maximum size allowed in MB for a log file before it is rolled over to the 
next version, default 1GB<br></br>
-> **Default Value**: 1024 (Optional)<br></br>
-> `Config Param: WRITE_LOG_MAX_SIZE`<br></br>
-
----
-
-> #### hive_sync.file_format
-> File format for hive sync, default 'PARQUET'<br></br>
-> **Default Value**: PARQUET (Optional)<br></br>
-> `Config Param: HIVE_SYNC_FILE_FORMAT`<br></br>
-
----
-
 > #### hive_sync.mode
 > Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql, default 
 > 'jdbc'<br></br>
 > **Default Value**: jdbc (Optional)<br></br>
@@ -860,31 +754,31 @@ By default 2000 and it will be doubled by every 
retry<br></br>
 
 ---
 
-> #### hive_sync.db
-> Database name for hive sync, default 'default'<br></br>
-> **Default Value**: default (Optional)<br></br>
-> `Config Param: HIVE_SYNC_DB`<br></br>
+> #### clustering.async.enabled
+> Async Clustering, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CLUSTERING_ASYNC_ENABLED`<br></br>
 
 ---
 
-> #### index.type
-> Index type of Flink write job, default is using state backed index.<br></br>
-> **Default Value**: FLINK_STATE (Optional)<br></br>
-> `Config Param: INDEX_TYPE`<br></br>
+> #### clustering.plan.partition.filter.mode
+> Partition filter mode used in the creation of clustering plan. Available 
values are - NONE: do not filter table partition and thus the clustering plan 
will include all partitions that have clustering candidate.RECENT_DAYS: keep a 
continuous range of partitions, worked together with configs 
'hoodie.clustering.plan.strategy.daybased.lookback.partitions' and 
'hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions.SELECTED_PARTITIONS:
 keep partitions that are in the specified r [...]
+> **Default Value**: NONE (Optional)<br></br>
+> `Config Param: CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME`<br></br>
 
 ---
 
-> #### hive_sync.password
-> Password for hive sync, default 'hive'<br></br>
-> **Default Value**: hive (Optional)<br></br>
-> `Config Param: HIVE_SYNC_PASSWORD`<br></br>
+> #### hive_sync.db
+> Database name for hive sync, default 'default'<br></br>
+> **Default Value**: default (Optional)<br></br>
+> `Config Param: HIVE_SYNC_DB`<br></br>
 
 ---
 
-> #### hive_sync.use_jdbc
-> Use JDBC when hive synchronization is enabled, default true<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: HIVE_SYNC_USE_JDBC`<br></br>
+> #### clustering.plan.strategy.sort.columns
+> Columns to sort the data by when clustering<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: CLUSTERING_SORT_COLUMNS`<br></br>
 
 ---
 
@@ -895,13 +789,6 @@ By default 2000 and it will be doubled by every 
retry<br></br>
 
 ---
 
-> #### hive_sync.jdbc_url
-> Jdbc URL for hive sync, default 'jdbc:hive2://localhost:10000'<br></br>
-> **Default Value**: jdbc:hive2://localhost:10000 (Optional)<br></br>
-> `Config Param: HIVE_SYNC_JDBC_URL`<br></br>
-
----
-
 > #### hive_sync.partition_extractor_class
 > Tool to extract the partition value from HDFS path, default 
 > 'SlashEncodedDayPartitionValueExtractor'<br></br>
 > **Default Value**: 
 > org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor 
 > (Optional)<br></br>
@@ -909,13 +796,6 @@ By default 2000 and it will be doubled by every 
retry<br></br>
 
 ---
 
-> #### read.start-commit
-> Start commit instant for reading, the commit time format should be 
'yyyyMMddHHmmss', by default reading from the latest instant for streaming 
read<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: READ_START_COMMIT`<br></br>
-
----
-
 > #### write.precombine
 > Flag to indicate whether to drop duplicates before insert/upsert.
 By default these cases will accept duplicates, to gain extra performance:
@@ -933,16 +813,9 @@ By default these cases will accept duplicates, to gain 
extra performance:
 
 ---
 
-> #### archive.min_commits
-> Min number of commits to keep before archiving older commits into a 
sequential log, default 40<br></br>
-> **Default Value**: 40 (Optional)<br></br>
-> `Config Param: ARCHIVE_MIN_COMMITS`<br></br>
-
----
-
 > #### hoodie.datasource.write.keygenerator.class
 > Key generator class, that implements will extract the key out of incoming 
 > record<br></br>
-> **Default Value**:  (Optional)<br></br>
+> **Default Value**: N/A (Required)<br></br>
 > `Config Param: KEYGEN_CLASS_NAME`<br></br>
 
 ---
@@ -955,17 +828,10 @@ if same key record with different partition path came in, 
default true<br></br>
 
 ---
 
-> #### index.partition.regex
-> Whether to load partitions in state if partition path matching， default 
`*`<br></br>
-> **Default Value**: .* (Optional)<br></br>
-> `Config Param: INDEX_PARTITION_REGEX`<br></br>
-
----
-
-> #### hoodie.table.name
-> Table name to register to Hive metastore<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: TABLE_NAME`<br></br>
+> #### clustering.delta_commits
+> Max delta commits needed to trigger clustering, default 4 commits<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: CLUSTERING_DELTA_COMMITS`<br></br>
 
 ---
 
@@ -995,13 +861,6 @@ there are two cases that this option can be used to avoid 
reading duplicates:
 
 ---
 
-> #### hoodie.datasource.write.partitionpath.urlencode
-> Whether to encode the partition path url, default false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
-
----
-
 > #### compaction.async.enabled
 > Async Compaction, enabled by default for MOR<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -1062,13 +921,6 @@ Actual value obtained by invoking .toString(), default 
''<br></br>
 
 ---
 
-> #### source.avro-schema.path
-> Source avro schema file path, the parsed schema is used for 
deserialization<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: SOURCE_AVRO_SCHEMA_PATH`<br></br>
-
----
-
 > #### compaction.delta_commits
 > Max delta commits needed to trigger compaction, default 5 commits<br></br>
 > **Default Value**: 5 (Optional)<br></br>
@@ -1076,16 +928,9 @@ Actual value obtained by invoking .toString(), default 
''<br></br>
 
 ---
 
-> #### write.insert.cluster
-> Whether to merge small files for insert mode, if true, the write throughput 
will decrease because the read/write of existing small file, only valid for COW 
table, default false<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: INSERT_CLUSTER`<br></br>
-
----
-
 > #### partition.default_name
 > The default partition name in case the dynamic partition column value is 
 > null/empty string<br></br>
-> **Default Value**: default (Optional)<br></br>
+> **Default Value**: __HIVE_DEFAULT_PARTITION__ (Optional)<br></br>
 > `Config Param: PARTITION_DEFAULT_NAME`<br></br>
 
 ---
@@ -1097,10 +942,17 @@ Actual value obtained by invoking .toString(), default 
''<br></br>
 
 ---
 
-> #### source.avro-schema
-> Source avro schema string, the parsed schema is used for 
deserialization<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: SOURCE_AVRO_SCHEMA`<br></br>
+> #### clustering.plan.strategy.small.file.limit
+> Files smaller than the size specified here are candidates for clustering, 
default 600 MB<br></br>
+> **Default Value**: 600 (Optional)<br></br>
+> `Config Param: CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT`<br></br>
+
+---
+
+> #### clustering.schedule.enabled
+> Schedule the cluster plan, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CLUSTERING_SCHEDULE_ENABLED`<br></br>
 
 ---
 
@@ -1111,10 +963,10 @@ Actual value obtained by invoking .toString(), default 
''<br></br>
 
 ---
 
-> #### write.rate.limit
-> Write record rate limit per second to prevent traffic jitter and improve 
stability, default 0 (no limit)<br></br>
-> **Default Value**: 0 (Optional)<br></br>
-> `Config Param: WRITE_RATE_LIMIT`<br></br>
+> #### clustering.plan.strategy.class
+> Config to provide a strategy class (subclass of ClusteringPlanStrategy) to 
create clustering plan i.e select what file groups are being clustered. Default 
strategy, looks at the last N (determined by 
clustering.plan.strategy.daybased.lookback.partitions) day based partitions 
picks the small file slices within those partitions.<br></br>
+> **Default Value**: 
org.apache.hudi.client.clustering.plan.strategy.FlinkSizeBasedClusteringPlanStrategy
 (Optional)<br></br>
+> `Config Param: CLUSTERING_PLAN_STRATEGY_CLASS`<br></br>
 
 ---
 
@@ -1147,13 +999,6 @@ This also directly translates into how much you can 
incrementally pull on this t
 
 ---
 
-> #### read.utc-timezone
-> Use UTC timezone or local timezone to the conversion between epoch time and 
LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC 
timezone, by default true<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: UTC_TIMEZONE`<br></br>
-
----
-
 > #### archive.max_commits
 > Max number of commits to keep before archiving older commits into a 
 > sequential log, default 50<br></br>
 > **Default Value**: 50 (Optional)<br></br>
@@ -1161,26 +1006,6 @@ This also directly translates into how much you can 
incrementally pull on this t
 
 ---
 
-> #### hoodie.datasource.query.type
-> Decides how data files need to be read, in
-1) Snapshot mode (obtain latest view, based on row &amp; columnar data);
-2) incremental mode (new data since an instantTime);
-3) Read Optimized mode (obtain latest view, based on columnar data)
-.Default: snapshot<br></br>
-> **Default Value**: snapshot (Optional)<br></br>
-> `Config Param: QUERY_TYPE`<br></br>
-
----
-
-> #### write.precombine.field
-> Field used in preCombining before actual write. When two records have the 
same
-key value, we will pick the one with the largest value for the precombine 
field,
-determined by Object.compareTo(..)<br></br>
-> **Default Value**: ts (Optional)<br></br>
-> `Config Param: PRECOMBINE_FIELD`<br></br>
-
----
-
 > #### write.index_bootstrap.tasks
 > Parallelism of tasks that do index bootstrap, default is the parallelism of 
 > the execution environment<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -1204,13 +1029,6 @@ Actual value will be obtained by invoking .toString() on 
the field value. Nested
 
 ---
 
-> #### write.parquet.page.size
-> Parquet page size. Page is the unit of read within a parquet file. Within a 
block, pages are compressed separately.<br></br>
-> **Default Value**: 1 (Optional)<br></br>
-> `Config Param: WRITE_PARQUET_PAGE_SIZE`<br></br>
-
----
-
 > #### compaction.delta_seconds
 > Max delta seconds time needed to trigger compaction, default 1 hour<br></br>
 > **Default Value**: 3600 (Optional)<br></br>
@@ -1218,17 +1036,325 @@ Actual value will be obtained by invoking .toString() 
on the field value. Nested
 
 ---
 
-> #### hive_sync.metastore.uris
-> Metastore uris for hive sync, default ''<br></br>
+> #### hive_sync.partition_fields
+> Partition fields for hive sync, default ''<br></br>
 > **Default Value**:  (Optional)<br></br>
-> `Config Param: HIVE_SYNC_METASTORE_URIS`<br></br>
+> `Config Param: HIVE_SYNC_PARTITION_FIELDS`<br></br>
 
 ---
 
-> #### hive_sync.partition_fields
-> Partition fields for hive sync, default ''<br></br>
+> #### read.streaming.enabled
+> Whether to read as streaming source, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: READ_AS_STREAMING`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.type
+> Key generator type, that implements will extract the key out of incoming 
record<br></br>
+> **Default Value**: SIMPLE (Optional)<br></br>
+> `Config Param: KEYGEN_TYPE`<br></br>
+
+---
+
+> #### clean.retain_file_versions
+> Number of file versions to retain. default 5<br></br>
+> **Default Value**: 5 (Optional)<br></br>
+> `Config Param: CLEAN_RETAIN_FILE_VERSIONS`<br></br>
+
+---
+
+> #### compaction.max_memory
+> Max memory in MB for compaction spillable map, default 100MB<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: COMPACTION_MAX_MEMORY`<br></br>
+
+---
+
+> #### hive_sync.support_timestamp
+> INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp 
type.
+Disabled by default for backward compatibility.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_SUPPORT_TIMESTAMP`<br></br>
+
+---
+
+> #### hive_sync.skip_ro_suffix
+> Skip the _ro suffix for Read optimized table when registering, default 
false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_SKIP_RO_SUFFIX`<br></br>
+
+---
+
+> #### metadata.compaction.delta_commits
+> Max delta commits for metadata table to trigger compaction, default 
10<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: METADATA_COMPACTION_DELTA_COMMITS`<br></br>
+
+---
+
+> #### hive_sync.assume_date_partitioning
+> Assume partitioning is yyyy/mm/dd, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ASSUME_DATE_PARTITION`<br></br>
+
+---
+
+> #### write.parquet.block.size
+> Parquet RowGroup size. It's recommended to make this large enough that scan 
costs can be amortized by packing enough column values into a single row 
group.<br></br>
+> **Default Value**: 120 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_BLOCK_SIZE`<br></br>
+
+---
+
+> #### clustering.plan.strategy.target.file.max.bytes
+> Each group can produce 'N' 
(CLUSTERING_MAX_GROUP_SIZE/CLUSTERING_TARGET_FILE_SIZE) output file groups, 
default 1 GB<br></br>
+> **Default Value**: 1073741824 (Optional)<br></br>
+> `Config Param: CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES`<br></br>
+
+---
+
+> #### clustering.tasks
+> Parallelism of tasks that do actual clustering, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: CLUSTERING_TASKS`<br></br>
+
+---
+
+> #### hive_sync.enable
+> Asynchronously sync Hive meta to HMS, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ENABLED`<br></br>
+
+---
+
+> #### changelog.enabled
+> Whether to keep all the intermediate changes, we try to keep all the changes 
of a record when enabled:
+1). The sink accept the UPDATE_BEFORE message;
+2). The source try to emit every changes of a record.
+The semantics is best effort because the compaction job would finally merge 
all changes of a record into one.
+ default false to have UPSERT semantics<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CHANGELOG_ENABLED`<br></br>
+
+---
+
+> #### read.streaming.check-interval
+> Check interval for streaming read of SECOND, default 1 minute<br></br>
+> **Default Value**: 60 (Optional)<br></br>
+> `Config Param: READ_STREAMING_CHECK_INTERVAL`<br></br>
+
+---
+
+> #### hoodie.datasource.merge.type
+> For Snapshot query on merge on read table. Use this key to define how the 
payloads are merged, in
+1) skip_merge: read the base file records plus the log file records;
+2) payload_combine: read the base file records first, for each record in base 
file, checks whether the key is in the
+   log file records(combines the two records with same key for base and log 
file records), then read the left log file records<br></br>
+> **Default Value**: payload_combine (Optional)<br></br>
+> `Config Param: MERGE_TYPE`<br></br>
+
+---
+
+> #### read.tasks
+> Parallelism of tasks that do actual read, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: READ_TASKS`<br></br>
+
+---
+
+> #### read.end-commit
+> End commit instant for reading, the commit time format should be 
'yyyyMMddHHmmss'<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_END_COMMIT`<br></br>
+
+---
+
+> #### write.log.max.size
+> Maximum size allowed in MB for a log file before it is rolled over to the 
next version, default 1GB<br></br>
+> **Default Value**: 1024 (Optional)<br></br>
+> `Config Param: WRITE_LOG_MAX_SIZE`<br></br>
+
+---
+
+> #### clustering.plan.strategy.daybased.lookback.partitions
+> Number of partitions to list to create ClusteringPlan, default is 2<br></br>
+> **Default Value**: 2 (Optional)<br></br>
+> `Config Param: CLUSTERING_TARGET_PARTITIONS`<br></br>
+
+---
+
+> #### hive_sync.file_format
+> File format for hive sync, default 'PARQUET'<br></br>
+> **Default Value**: PARQUET (Optional)<br></br>
+> `Config Param: HIVE_SYNC_FILE_FORMAT`<br></br>
+
+---
+
+> #### clustering.plan.strategy.max.num.groups
+> Maximum number of groups to create as part of ClusteringPlan. Increasing 
groups will increase parallelism, default is 30<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: CLUSTERING_MAX_NUM_GROUPS`<br></br>
+
+---
+
+> #### index.type
+> Index type of Flink write job, default is using state backed index.<br></br>
+> **Default Value**: FLINK_STATE (Optional)<br></br>
+> `Config Param: INDEX_TYPE`<br></br>
+
+---
+
+> #### read.data.skipping.enabled
+> Enables data-skipping allowing queries to leverage indexes to reduce the 
search space byskipping over files<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: READ_DATA_SKIPPING_ENABLED`<br></br>
+
+---
+
+> #### clean.policy
+> Clean policy to manage the Hudi table. Available option: 
KEEP_LATEST_COMMITS, KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_BY_HOURS.Default is 
KEEP_LATEST_COMMITS.<br></br>
+> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
+> `Config Param: CLEAN_POLICY`<br></br>
+
+---
+
+> #### hive_sync.password
+> Password for hive sync, default 'hive'<br></br>
+> **Default Value**: hive (Optional)<br></br>
+> `Config Param: HIVE_SYNC_PASSWORD`<br></br>
+
+---
+
+> #### hive_sync.use_jdbc
+> Use JDBC when hive synchronization is enabled, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_USE_JDBC`<br></br>
+
+---
+
+> #### hive_sync.jdbc_url
+> Jdbc URL for hive sync, default 'jdbc:hive2://localhost:10000'<br></br>
+> **Default Value**: jdbc:hive2://localhost:10000 (Optional)<br></br>
+> `Config Param: HIVE_SYNC_JDBC_URL`<br></br>
+
+---
+
+> #### read.start-commit
+> Start commit instant for reading, the commit time format should be 
'yyyyMMddHHmmss', by default reading from the latest instant for streaming 
read<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_START_COMMIT`<br></br>
+
+---
+
+> #### archive.min_commits
+> Min number of commits to keep before archiving older commits into a 
sequential log, default 40<br></br>
+> **Default Value**: 40 (Optional)<br></br>
+> `Config Param: ARCHIVE_MIN_COMMITS`<br></br>
+
+---
+
+> #### index.partition.regex
+> Whether to load partitions in state if partition path matching， default 
`*`<br></br>
+> **Default Value**: .* (Optional)<br></br>
+> `Config Param: INDEX_PARTITION_REGEX`<br></br>
+
+---
+
+> #### hoodie.table.name
+> Table name to register to Hive metastore<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Whether to encode the partition path url, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### source.avro-schema.path
+> Source avro schema file path, the parsed schema is used for 
deserialization<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SOURCE_AVRO_SCHEMA_PATH`<br></br>
+
+---
+
+> #### write.insert.cluster
+> Whether to merge small files for insert mode, if true, the write throughput 
will decrease because the read/write of existing small file, only valid for COW 
table, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INSERT_CLUSTER`<br></br>
+
+---
+
+> #### source.avro-schema
+> Source avro schema string, the parsed schema is used for 
deserialization<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SOURCE_AVRO_SCHEMA`<br></br>
+
+---
+
+> #### hive_sync.conf.dir
+> The hive configuration directory, where the hive-site.xml lies in, the file 
should be put on the client machine<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_SYNC_CONF_DIR`<br></br>
+
+---
+
+> #### write.rate.limit
+> Write record rate limit per second to prevent traffic jitter and improve 
stability, default 0 (no limit)<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: WRITE_RATE_LIMIT`<br></br>
+
+---
+
+> #### clean.retain_hours
+> Number of hours for which commits need to be retained. This config provides 
a more flexible option ascompared to number of commits retained for cleaning 
service. Setting this property ensures all the files, but the latest in a file 
group, corresponding to commits with commit times older than the configured 
number of hours to be retained are cleaned.<br></br>
+> **Default Value**: 24 (Optional)<br></br>
+> `Config Param: CLEAN_RETAIN_HOURS`<br></br>
+
+---
+
+> #### read.utc-timezone
+> Use UTC timezone or local timezone to the conversion between epoch time and 
LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC 
timezone, by default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: UTC_TIMEZONE`<br></br>
+
+---
+
+> #### hoodie.datasource.query.type
+> Decides how data files need to be read, in
+1) Snapshot mode (obtain latest view, based on row &amp; columnar data);
+2) incremental mode (new data since an instantTime);
+3) Read Optimized mode (obtain latest view, based on columnar data)
+.Default: snapshot<br></br>
+> **Default Value**: snapshot (Optional)<br></br>
+> `Config Param: QUERY_TYPE`<br></br>
+
+---
+
+> #### write.precombine.field
+> Field used in preCombining before actual write. When two records have the 
same
+key value, we will pick the one with the largest value for the precombine 
field,
+determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: PRECOMBINE_FIELD`<br></br>
+
+---
+
+> #### write.parquet.page.size
+> Parquet page size. Page is the unit of read within a parquet file. Within a 
block, pages are compressed separately.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_PAGE_SIZE`<br></br>
+
+---
+
+> #### hive_sync.metastore.uris
+> Metastore uris for hive sync, default ''<br></br>
 > **Default Value**:  (Optional)<br></br>
-> `Config Param: HIVE_SYNC_PARTITION_FIELDS`<br></br>
+> `Config Param: HIVE_SYNC_METASTORE_URIS`<br></br>
 
 ---
 
@@ -1306,6 +1432,136 @@ Controls callback behavior into HTTP endpoints, to push 
 notifications on commit
 
 ---
 
+### Clean Configs {#Clean-Configs}
+
+Cleaning (reclamation of older/unused file groups/slices).
+
+`Config Class`: org.apache.hudi.config.HoodieCleanConfig<br></br>
+> #### hoodie.cleaner.fileversions.retained
+> When KEEP_LATEST_FILE_VERSIONS cleaning policy is used,  the minimum number 
of file slices to retain in each file group, during cleaning.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CLEANER_FILE_VERSIONS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.clean.max.commits
+> Number of commits after the last clean operation, before scheduling of a new 
clean is attempted.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: CLEAN_MAX_COMMITS`<br></br>
+
+---
+
+> #### hoodie.clean.allow.multiple
+> Allows scheduling/executing multiple cleans by enabling this config. If 
users prefer to strictly ensure clean requests should be mutually exclusive, 
.i.e. a 2nd clean will not be scheduled if another clean is not yet completed 
to avoid repeat cleaning of same files, they might want to disable this 
config.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ALLOW_MULTIPLE_CLEANS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clean.automatic
+> When enabled, the cleaner table service is invoked immediately after each 
commit, to delete older file slices. It's recommended to enable this, to ensure 
metadata and data storage growth is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_CLEAN`<br></br>
+
+---
+
+> #### hoodie.cleaner.parallelism
+> Parallelism for the cleaning operation. Increase this if cleaning becomes 
slow.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: CLEANER_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.cleaner.incremental.mode
+> When enabled, the plans for each cleaner service run is computed 
incrementally off the events  in the timeline, since the last cleaner run. This 
is much more efficient than obtaining listings for the full table for each 
planning (even with a metadata table).<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: CLEANER_INCREMENTAL_MODE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.clean.async
+> Only applies when hoodie.clean.automatic is turned on. When turned on runs 
cleaner async with writing, which can speed up overall write 
performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLEAN`<br></br>
+
+---
+
+> #### hoodie.clean.trigger.strategy
+> Controls how cleaning is scheduled. Valid options: NUM_COMMITS<br></br>
+> **Default Value**: NUM_COMMITS (Optional)<br></br>
+> `Config Param: CLEAN_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.cleaner.delete.bootstrap.base.file
+> When set to true, cleaner also deletes the bootstrap base file when it's 
skeleton base file is  cleaned. Turn this to true, if you want to ensure the 
bootstrap dataset storage is reclaimed over time, as the table receives 
updates/deletes. Another reason to turn this on, would be to ensure data 
residing in bootstrap  base files are also physically deleted, to comply with 
data privacy enforcement processes.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CLEANER_BOOTSTRAP_BASE_FILE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.cleaner.hours.retained
+> Number of hours for which commits need to be retained. This config provides 
a more flexible option ascompared to number of commits retained for cleaning 
service. Setting this property ensures all the files, but the latest in a file 
group, corresponding to commits with commit times older than the configured 
number of hours to be retained are cleaned.<br></br>
+> **Default Value**: 24 (Optional)<br></br>
+> `Config Param: CLEANER_HOURS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.cleaner.commits.retained
+> Number of commits to retain, without cleaning. This will be retained for 
num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much data retention the table supports for incremental 
queries.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy.failed.writes
+> Cleaning policy for failed writes to be used. Hudi will delete any files 
written by failed writes to re-claim space. Choose to perform this rollback of 
failed writes eagerly before every writer starts (only supported for single 
writer) or lazily by the cleaner (required for multi-writers)<br></br>
+> **Default Value**: EAGER (Optional)<br></br>
+> `Config Param: FAILED_WRITES_CLEANER_POLICY`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy
+> Cleaning policy to be used. The cleaner service deletes older file slices 
files to re-claim space. By default, cleaner spares the file slices written by 
the last N commits, determined by  hoodie.cleaner.commits.retained Long running 
query plans may often refer to older file slices and will break if those are 
cleaned, before the query has had   a chance to run. So, it is good to make 
sure that the data is retained for more than the maximum query execution 
time<br></br>
+> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
+> `Config Param: CLEANER_POLICY`<br></br>
+
+---
+
+### Metastore Configs {#Metastore-Configs}
+
+Configurations used by the Hudi Metastore.
+
+`Config Class`: org.apache.hudi.common.config.HoodieMetastoreConfig<br></br>
+> #### hoodie.metastore.uris
+> Metastore server uris<br></br>
+> **Default Value**: thrift://localhost:9090 (Optional)<br></br>
+> `Config Param: METASTORE_URLS`<br></br>
+
+---
+
+> #### hoodie.metastore.enable
+> Use metastore server to store hoodie table metadata<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: METASTORE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.metastore.connect.retries
+> Number of retries while opening a connection to metastore<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: METASTORE_CONNECTION_RETRIES`<br></br>
+
+---
+
+> #### hoodie.metastore.connect.retry.delay
+> Number of seconds for the client to wait between consecutive connection 
attempts<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: METASTORE_CONNECTION_RETRY_DELAY`<br></br>
+
+---
+
 ### Table Configurations {#Table-Configurations}
 
 Configurations that persist across writes and read on a Hudi table  like  
base, log file formats, table name, creation schema, table version layouts.  
Configurations are loaded from hoodie.properties, these properties are usually 
set during initializing a path as hoodie base path and rarely changes during 
the lifetime of the table. Writers/Queries' configurations are validated 
against these  each time for compatibility.
@@ -1524,31 +1780,100 @@ Controls memory usage for compaction and merges, 
performed internally by Hudi.
 
 ---
 
-> #### hoodie.memory.compaction.fraction
-> HoodieCompactedLogScanner reads logblocks, converts records to HoodieRecords 
and then merges these log blocks and records. At any point, the number of 
entries in a log block can be less than or equal to the number of entries in 
the corresponding parquet file. This can lead to OOM in the Scanner. Hence, a 
spillable map helps alleviate the memory pressure. Use this config to set the 
max allowable inMemory footprint of the spillable map<br></br>
-> **Default Value**: 0.6 (Optional)<br></br>
-> `Config Param: MAX_MEMORY_FRACTION_FOR_COMPACTION`<br></br>
+> #### hoodie.memory.compaction.fraction
+> HoodieCompactedLogScanner reads logblocks, converts records to HoodieRecords 
and then merges these log blocks and records. At any point, the number of 
entries in a log block can be less than or equal to the number of entries in 
the corresponding parquet file. This can lead to OOM in the Scanner. Hence, a 
spillable map helps alleviate the memory pressure. Use this config to set the 
max allowable inMemory footprint of the spillable map<br></br>
+> **Default Value**: 0.6 (Optional)<br></br>
+> `Config Param: MAX_MEMORY_FRACTION_FOR_COMPACTION`<br></br>
+
+---
+
+> #### hoodie.memory.merge.max.size
+> Maximum amount of memory used  in bytes for merge operations, before 
spilling to local storage.<br></br>
+> **Default Value**: 1073741824 (Optional)<br></br>
+> `Config Param: MAX_MEMORY_FOR_MERGE`<br></br>
+
+---
+
+> #### hoodie.memory.spillable.map.path
+> Default file path prefix for spillable map<br></br>
+> **Default Value**: /tmp/ (Optional)<br></br>
+> `Config Param: SPILLABLE_MAP_BASE_PATH`<br></br>
+
+---
+
+> #### hoodie.memory.compaction.max.size
+> Maximum amount of memory used  in bytes for compaction operations in bytes , 
before spilling to local storage.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MAX_MEMORY_FOR_COMPACTION`<br></br>
+
+---
+
+### DynamoDB based Locks Configurations {#DynamoDB-based-Locks-Configurations}
+
+Configs that control DynamoDB based locking mechanisms required for 
concurrency control  between writers to a Hudi table. Concurrency between 
Hudi's own table services  are auto managed internally.
+
+`Config Class`: org.apache.hudi.config.DynamoDbBasedLockConfig<br></br>
+> #### hoodie.write.lock.dynamodb.billing_mode
+> For DynamoDB based lock provider, by default it is PAY_PER_REQUEST 
mode<br></br>
+> **Default Value**: PAY_PER_REQUEST (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_BILLING_MODE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.table
+> For DynamoDB based lock provider, the name of the DynamoDB table acting as 
lock table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DYNAMODB_LOCK_TABLE_NAME`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.region
+> For DynamoDB based lock provider, the region used in endpoint for Amazon 
DynamoDB service. Would try to first get it from AWS_REGION environment 
variable. If not find, by default use us-east-1<br></br>
+> **Default Value**: us-east-1 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_REGION`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.partition_key
+> For DynamoDB based lock provider, the partition key for the DynamoDB lock 
table. Each Hudi dataset should has it's unique key so concurrent writers could 
refer to the same partition key. By default we use the Hudi table name 
specified to be the partition key<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DYNAMODB_LOCK_PARTITION_KEY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.write_capacity
+> For DynamoDB based lock provider, write capacity units when using 
PROVISIONED billing mode<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_WRITE_CAPACITY`<br></br>
+> `Since Version: 0.10.0`<br></br>
 
 ---
 
-> #### hoodie.memory.merge.max.size
-> Maximum amount of memory used  in bytes for merge operations, before 
spilling to local storage.<br></br>
-> **Default Value**: 1073741824 (Optional)<br></br>
-> `Config Param: MAX_MEMORY_FOR_MERGE`<br></br>
+> #### hoodie.write.lock.dynamodb.table_creation_timeout
+> For DynamoDB based lock provider, the maximum number of milliseconds to wait 
for creating DynamoDB table<br></br>
+> **Default Value**: 600000 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_TABLE_CREATION_TIMEOUT`<br></br>
+> `Since Version: 0.10.0`<br></br>
 
 ---
 
-> #### hoodie.memory.spillable.map.path
-> Default file path prefix for spillable map<br></br>
-> **Default Value**: /tmp/ (Optional)<br></br>
-> `Config Param: SPILLABLE_MAP_BASE_PATH`<br></br>
+> #### hoodie.write.lock.dynamodb.read_capacity
+> For DynamoDB based lock provider, read capacity units when using PROVISIONED 
billing mode<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_READ_CAPACITY`<br></br>
+> `Since Version: 0.10.0`<br></br>
 
 ---
 
-> #### hoodie.memory.compaction.max.size
-> Maximum amount of memory used  in bytes for compaction operations in bytes , 
before spilling to local storage.<br></br>
+> #### hoodie.write.lock.dynamodb.endpoint_url
+> For DynamoDB based lock provider, the url endpoint used for Amazon DynamoDB 
service. Useful for development with a local dynamodb instance.<br></br>
 > **Default Value**: N/A (Required)<br></br>
-> `Config Param: MAX_MEMORY_FOR_COMPACTION`<br></br>
+> `Config Param: DYNAMODB_ENDPOINT_URL`<br></br>
+> `Since Version: 0.10.1`<br></br>
 
 ---
 
@@ -1641,6 +1966,14 @@ Configurations that control aspects around writing, 
sizing, reading base and log
 
 ---
 
+> #### hoodie.parquet.field_id.write.enabled
+> Would only be effective with Spark 3.3+. Sets 
spark.sql.parquet.fieldId.write.enabled. If enabled, Spark will write out 
parquet native field ids that are stored inside StructField's metadata as 
parquet.field.id to parquet files.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PARQUET_FIELD_ID_WRITE_ENABLED`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.parquet.page.size
 > Parquet page size in bytes. Page is the unit of read within a parquet file. 
 > Within a block, pages are compressed separately.<br></br>
 > **Default Value**: 1048576 (Optional)<br></br>
@@ -1690,72 +2023,80 @@ Configurations that control aspects around writing, 
sizing, reading base and log
 
 ---
 
-### DynamoDB based Locks Configurations {#DynamoDB-based-Locks-Configurations}
+### Archival Configs {#Archival-Configs}
 
-Configs that control DynamoDB based locking mechanisms required for 
concurrency control  between writers to a Hudi table. Concurrency between 
Hudi's own table services  are auto managed internally.
+Configurations that control archival.
 
-`Config Class`: org.apache.hudi.config.DynamoDbBasedLockConfig<br></br>
-> #### hoodie.write.lock.dynamodb.billing_mode
-> For DynamoDB based lock provider, by default it is PAY_PER_REQUEST mode. 
Alternative is PROVISIONED<br></br>
-> **Default Value**: PAY_PER_REQUEST (Optional)<br></br>
-> `Config Param: DYNAMODB_LOCK_BILLING_MODE`<br></br>
-> `Since Version: 0.10.0`<br></br>
+`Config Class`: org.apache.hudi.config.HoodieArchivalConfig<br></br>
+> #### hoodie.archive.merge.small.file.limit.bytes
+> This config sets the archive file size limit below which an archive file 
becomes a candidate to be selected as such a small file.<br></br>
+> **Default Value**: 20971520 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.table
-> For DynamoDB based lock provider, the name of the DynamoDB table acting as 
lock table<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: DYNAMODB_LOCK_TABLE_NAME`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> #### hoodie.keep.max.commits
+> Archiving service moves older entries from timeline into an archived log 
after each write, to  keep the metadata overhead constant, even as the table 
size grows.This config controls the maximum number of instants to retain in the 
active timeline. <br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.region
-> For DynamoDB based lock provider, the region used in endpoint for Amazon 
DynamoDB service. Would try to first get it from AWS_REGION environment 
variable. If not find, by default use us-east-1<br></br>
-> **Default Value**: us-east-1 (Optional)<br></br>
-> `Config Param: DYNAMODB_LOCK_REGION`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> #### hoodie.archive.merge.enable
+> When enable, hoodie will auto merge several small archive files into larger 
one. It's useful when storage scheme doesn't support append operation.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_ENABLE`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.partition_key
-> For DynamoDB based lock provider, the partition key for the DynamoDB lock 
table. Each Hudi dataset should has it's unique key so concurrent writers could 
refer to the same partition key. By default we use the Hudi table name 
specified to be the partition key<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: DYNAMODB_LOCK_PARTITION_KEY`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> #### hoodie.archive.automatic
+> When enabled, the archival table service is invoked immediately after each 
commit, to archive commits if we cross a maximum value of commits. It's 
recommended to enable this, to ensure number of active commits is 
bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_ARCHIVE`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.write_capacity
-> For DynamoDB based lock provider, write capacity units when using 
PROVISIONED billing mode<br></br>
+> #### hoodie.archive.delete.parallelism
+> Parallelism for deleting archived hoodie commits.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: DELETE_ARCHIVED_INSTANT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.archive.beyond.savepoint
+> If enabled, archival will proceed beyond savepoint, skipping savepoint 
commits. If disabled, archival will stop at the earliest savepoint 
commit.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_BEYOND_SAVEPOINT`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
+> #### hoodie.commits.archival.batch
+> Archiving of instants is batched in best-effort manner, to pack more 
instants into a single archive log. This config controls such archival batch 
size.<br></br>
 > **Default Value**: 10 (Optional)<br></br>
-> `Config Param: DYNAMODB_LOCK_WRITE_CAPACITY`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.table_creation_timeout
-> For DynamoDB based lock provider, the maximum number of milliseconds to wait 
for creating DynamoDB table<br></br>
-> **Default Value**: 600000 (Optional)<br></br>
-> `Config Param: DYNAMODB_LOCK_TABLE_CREATION_TIMEOUT`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> #### hoodie.archive.async
+> Only applies when hoodie.archive.automatic is turned on. When turned on runs 
archiver async with writing, which can speed up overall write 
performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_ARCHIVE`<br></br>
+> `Since Version: 0.11.0`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.read_capacity
-> For DynamoDB based lock provider, read capacity units when using PROVISIONED 
billing mode<br></br>
+> #### hoodie.keep.min.commits
+> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.<br></br>
 > **Default Value**: 20 (Optional)<br></br>
-> `Config Param: DYNAMODB_LOCK_READ_CAPACITY`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
 
 ---
 
-> #### hoodie.write.lock.dynamodb.endpoint_url
-> For DynamoDB based lock provider, the url endpoint used for Amazon DynamoDB 
service. Useful for development with a local dynamodb instance.<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: DYNAMODB_ENDPOINT_URL`<br></br>
-> `Since Version: 0.10.1`<br></br>
+> #### hoodie.archive.merge.files.batch.size
+> The number of small archive files to be merged at once.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_FILES_BATCH_SIZE`<br></br>
 
 ---
 
@@ -1788,22 +2129,6 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
-> #### hoodie.metadata.index.column.stats.enable
-> Enable indexing column ranges of user data files under metadata table key 
lookups. When enabled, metadata table will have a partition to store the column 
ranges and will be used for pruning files during the index lookups.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: ENABLE_METADATA_INDEX_COLUMN_STATS`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.metadata.index.bloom.filter.column.list
-> Comma-separated list of columns for which bloom filter index will be built. 
If not set, only record key will be indexed.<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: BLOOM_FILTER_INDEX_FOR_COLUMNS`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
 > #### hoodie.metadata.metrics.enable
 > Enable publishing of metrics around metadata table.<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -1820,22 +2145,6 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
-> #### hoodie.metadata.cleaner.commits.retained
-> Number of commits to retain, without cleaning, on metadata table.<br></br>
-> **Default Value**: 3 (Optional)<br></br>
-> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
-> `Since Version: 0.7.0`<br></br>
-
----
-
-> #### hoodie.metadata.index.check.timeout.seconds
-> After the async indexer has finished indexing upto the base instant, it will 
ensure that all inflight writers reliably write index updates as well. If this 
timeout expires, then the indexer will abort itself safely.<br></br>
-> **Default Value**: 900 (Optional)<br></br>
-> `Config Param: METADATA_INDEX_CHECK_TIMEOUT_SECONDS`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
 > #### _hoodie.metadata.ignore.spurious.deletes
 > There are cases when extra files are requested to be deleted from metadata 
 > table which are never added before. This config determines how to handle 
 > such spurious deletes<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -1852,14 +2161,6 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
-> #### hoodie.metadata.populate.meta.fields
-> When enabled, populates all meta fields. When disabled, no meta fields are 
populated.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: POPULATE_META_FIELDS`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
 > #### hoodie.metadata.index.async
 > Enable asynchronous indexing of metadata table.<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -1884,22 +2185,6 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
-> #### hoodie.metadata.index.column.stats.file.group.count
-> Metadata column stats partition file group count. This controls the size of 
the base and log files and read parallelism in the column stats index 
partition. The recommendation is to size the file group count such that the 
base files are under 1GB.<br></br>
-> **Default Value**: 2 (Optional)<br></br>
-> `Config Param: METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.metadata.enable
-> Enable the internal metadata table which serves table metadata like level 
file listings<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: ENABLE`<br></br>
-> `Since Version: 0.7.0`<br></br>
-
----
-
 > #### hoodie.metadata.index.bloom.filter.enable
 > Enable indexing bloom filters of user data files under metadata table. When 
 > enabled, metadata table will have a partition to store the bloom filter 
 > index and will be used during the index lookups.<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -1908,14 +2193,6 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
-> #### hoodie.metadata.index.bloom.filter.parallelism
-> Parallelism to use for generating bloom filter index in metadata 
table.<br></br>
-> **Default Value**: 200 (Optional)<br></br>
-> `Config Param: BLOOM_FILTER_INDEX_PARALLELISM`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
 > #### hoodie.metadata.clean.async
 > Enable asynchronous cleaning for metadata table<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -1948,6 +2225,14 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
+> #### hoodie.metadata.index.column.stats.processing.mode.override
+> By default Column Stats Index is automatically determining whether it should 
be read and processed either'in-memory' (w/in executing process) or using Spark 
(on a cluster), based on some factors like the size of the Index and how many 
columns are read. This config allows to override this behavior.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: COLUMN_STATS_INDEX_PROCESSING_MODE_OVERRIDE`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.metadata.keep.min.commits
 > Archiving service moves older entries from metadata table’s timeline into an 
 > archived log after each write, to keep the overhead constant, even as the 
 > metadata table size grows.  This config controls the minimum number of 
 > instants to retain in the active timeline.<br></br>
 > **Default Value**: 20 (Optional)<br></br>
@@ -1956,6 +2241,78 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 ---
 
+> #### hoodie.metadata.index.column.stats.inMemory.projection.threshold
+> When reading Column Stats Index, if the size of the expected resulting 
projection is below the in-memory threshold (counted by the # of rows), it will 
be attempted to be loaded "in-memory" (ie not using the execution engine like 
Spark, Flink, etc). If the value is above the threshold execution engine will 
be used to compose the projection.<br></br>
+> **Default Value**: 100000 (Optional)<br></br>
+> `Config Param: COLUMN_STATS_INDEX_IN_MEMORY_PROJECTION_THRESHOLD`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.column.stats.enable
+> Enable indexing column ranges of user data files under metadata table key 
lookups. When enabled, metadata table will have a partition to store the column 
ranges and will be used for pruning files during the index lookups.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE_METADATA_INDEX_COLUMN_STATS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.column.list
+> Comma-separated list of columns for which bloom filter index will be built. 
If not set, only record key will be indexed.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BLOOM_FILTER_INDEX_FOR_COLUMNS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.cleaner.commits.retained
+> Number of commits to retain, without cleaning, on metadata table.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.check.timeout.seconds
+> After the async indexer has finished indexing upto the base instant, it will 
ensure that all inflight writers reliably write index updates as well. If this 
timeout expires, then the indexer will abort itself safely.<br></br>
+> **Default Value**: 900 (Optional)<br></br>
+> `Config Param: METADATA_INDEX_CHECK_TIMEOUT_SECONDS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.populate.meta.fields
+> When enabled, populates all meta fields. When disabled, no meta fields are 
populated.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: POPULATE_META_FIELDS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.column.stats.file.group.count
+> Metadata column stats partition file group count. This controls the size of 
the base and log files and read parallelism in the column stats index 
partition. The recommendation is to size the file group count such that the 
base files are under 1GB.<br></br>
+> **Default Value**: 2 (Optional)<br></br>
+> `Config Param: METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.enable
+> Enable the internal metadata table which serves table metadata like level 
file listings<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.parallelism
+> Parallelism to use for generating bloom filter index in metadata 
table.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_INDEX_PARALLELISM`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 ### Consistency Guard Configurations {#Consistency-Guard-Configurations}
 
 The consistency guard related config options, to help talk to eventually 
consistent object storage.(Tip: S3 is NOT eventually consistent anymore!)
@@ -2100,13 +2457,6 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 ---
 
-> #### hoodie.schema.on.read.enable
-> enable full schema evolution for hoodie<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: SCHEMA_EVOLUTION_ENABLE`<br></br>
-
----
-
 > #### hoodie.table.services.enabled
 > Master control to disable all table services including archive, clean, 
 > compact, cluster, etc.<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -2257,6 +2607,14 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 ---
 
+> #### hoodie.skip.default.partition.validation
+> When table is upgraded from pre 0.12 to 0.12, we check for "default" 
partition and fail if found one. Users are expected to rewrite the data in 
those partitions. Enabling this config will bypass this validation<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SKIP_DEFAULT_PARTITION_VALIDATION`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.markers.timeline_server_based.batch.num_threads
 > Number of threads to use for batch processing marker creation requests at 
 > the timeline server<br></br>
 > **Default Value**: 20 (Optional)<br></br>
@@ -2330,7 +2688,7 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 > #### hoodie.bulkinsert.sort.mode
 > Sorting modes to use for sorting records for bulk insert. This is use when 
 > user hoodie.bulkinsert.user.defined.partitioner.classis not configured. 
 > Available values are - GLOBAL_SORT: this ensures best file sizes, with 
 > lowest memory overhead at cost of sorting. PARTITION_SORT: Strikes a balance 
 > by only sorting within a partition, still keeping the memory overhead of 
 > writing lowest and best effort file sizing. NONE: No sorting. Fastest and 
 > matches `spark.write.parquet()` in terms of numb [...]
-> **Default Value**: GLOBAL_SORT (Optional)<br></br>
+> **Default Value**: NONE (Optional)<br></br>
 > `Config Param: BULK_INSERT_SORT_MODE`<br></br>
 
 ---
@@ -2371,13 +2729,6 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 ---
 
-> #### hoodie.refresh.timeline.server.based.on.latest.commit
-> Refresh timeline in timeline server based on latest commit apart from 
timeline hash difference. By default (false), <br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: REFRESH_TIMELINE_SERVER_BASED_ON_LATEST_COMMIT`<br></br>
-
----
-
 > #### hoodie.upsert.shuffle.parallelism
 > Parallelism to use for upsert operation on the table. Upserts can shuffle 
 > data to perform index lookups, file sizing, bin packing records 
 > optimallyinto file groups.<br></br>
 > **Default Value**: 200 (Optional)<br></br>
@@ -2548,20 +2899,6 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
-> #### hoodie.hbase.index.update.partition.path
-> Only applies if index type is HBASE. When an already existing record is 
upserted to a new partition compared to whats in storage, this config when set, 
will delete old record in old partition and will insert it as new record in new 
partition.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: UPDATE_PARTITION_PATH_ENABLE`<br></br>
-
----
-
-> #### hoodie.index.hbase.qps.allocator.class
-> Property to set which implementation of HBase QPS resource allocator to be 
used, whichcontrols the batching rate dynamically.<br></br>
-> **Default Value**: 
org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator (Optional)<br></br>
-> `Config Param: QPS_ALLOCATOR_CLASS_NAME`<br></br>
-
----
-
 > #### hoodie.index.hbase.put.batch.size.autocompute
 > Property to set to enable auto computation of put batch size<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -2569,6 +2906,13 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
+> #### hoodie.index.hbase.bucket.number
+> Only applicable when using RebalancedSparkHoodieHBaseIndex, same as hbase 
regions count can get the best performance<br></br>
+> **Default Value**: 8 (Optional)<br></br>
+> `Config Param: BUCKET_NUMBER`<br></br>
+
+---
+
 > #### hoodie.index.hbase.rollback.sync
 > When set to true, the rollback method will delete the last failed task 
 > index. The default value is false. Because deleting the index will add extra 
 > load on the Hbase cluster for each rollback<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -2576,20 +2920,6 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
-> #### hoodie.index.hbase.get.batch.size
-> Controls the batch size for performing gets against HBase. Batching improves 
throughput, by saving round trips.<br></br>
-> **Default Value**: 100 (Optional)<br></br>
-> `Config Param: GET_BATCH_SIZE`<br></br>
-
----
-
-> #### hoodie.index.hbase.zkpath.qps_root
-> chroot in zookeeper, to use for all qps allocation co-ordination.<br></br>
-> **Default Value**: /QPS_ROOT (Optional)<br></br>
-> `Config Param: ZKPATH_QPS_ROOT`<br></br>
-
----
-
 > #### hoodie.index.hbase.max.qps.per.region.server
 > Property to set maximum QPS allowed per Region Server. This should be same 
 > across various jobs. This is intended to
  limit the aggregate QPS generated across various jobs to an Hbase Region 
Server. It is recommended to set this
@@ -2600,13 +2930,6 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
-> #### hoodie.index.hbase.max.qps.fraction
-> Maximum for HBASE_QPS_FRACTION_PROP to stabilize skewed write 
workloads<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: MAX_QPS_FRACTION`<br></br>
-
----
-
 > #### hoodie.index.hbase.min.qps.fraction
 > Minimum for HBASE_QPS_FRACTION_PROP to stabilize skewed write 
 > workloads<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -2628,13 +2951,6 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
-> #### hoodie.index.hbase.dynamic_qps
-> Property to decide if HBASE_QPS_FRACTION_PROP is dynamically calculated 
based on write volume.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: COMPUTE_QPS_DYNAMICALLY`<br></br>
-
----
-
 > #### hoodie.index.hbase.zknode.path
 > Only applies if index type is HBASE. This is the root znode that will 
 > contain all the znodes created/used by HBase<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -2642,6 +2958,13 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
+> #### hoodie.index.hbase.kerberos.user.keytab
+> File name of the kerberos keytab file for connecting to the hbase 
cluster.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: KERBEROS_USER_KEYTAB`<br></br>
+
+---
+
 > #### hoodie.index.hbase.zkquorum
 > Only applies if index type is HBASE. HBase ZK Quorum url to connect 
 > to<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -2670,6 +2993,76 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
+> #### hoodie.hbase.index.update.partition.path
+> Only applies if index type is HBASE. When an already existing record is 
upserted to a new partition compared to whats in storage, this config when set, 
will delete old record in old partition and will insert it as new record in new 
partition.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.security.authentication
+> Property to decide if the hbase cluster secure authentication is enabled or 
not. Possible values are 'simple' (no authentication), and 'kerberos'.<br></br>
+> **Default Value**: simple (Optional)<br></br>
+> `Config Param: SECURITY_AUTHENTICATION`<br></br>
+
+---
+
+> #### hoodie.index.hbase.qps.allocator.class
+> Property to set which implementation of HBase QPS resource allocator to be 
used, whichcontrols the batching rate dynamically.<br></br>
+> **Default Value**: 
org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator (Optional)<br></br>
+> `Config Param: QPS_ALLOCATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.index.hbase.get.batch.size
+> Controls the batch size for performing gets against HBase. Batching improves 
throughput, by saving round trips.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: GET_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zkpath.qps_root
+> chroot in zookeeper, to use for all qps allocation co-ordination.<br></br>
+> **Default Value**: /QPS_ROOT (Optional)<br></br>
+> `Config Param: ZKPATH_QPS_ROOT`<br></br>
+
+---
+
+> #### hoodie.index.hbase.max.qps.fraction
+> Maximum for HBASE_QPS_FRACTION_PROP to stabilize skewed write 
workloads<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MAX_QPS_FRACTION`<br></br>
+
+---
+
+> #### hoodie.index.hbase.regionserver.kerberos.principal
+> The value of hbase.regionserver.kerberos.principal in hbase cluster.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: REGIONSERVER_PRINCIPAL`<br></br>
+
+---
+
+> #### hoodie.index.hbase.dynamic_qps
+> Property to decide if HBASE_QPS_FRACTION_PROP is dynamically calculated 
based on write volume.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMPUTE_QPS_DYNAMICALLY`<br></br>
+
+---
+
+> #### hoodie.index.hbase.master.kerberos.principal
+> The value of hbase.master.kerberos.principal in hbase cluster.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MASTER_PRINCIPAL`<br></br>
+
+---
+
+> #### hoodie.index.hbase.kerberos.user.principal
+> The kerberos principal name for connecting to the hbase cluster.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: KERBEROS_USER_PRINCIPAL`<br></br>
+
+---
+
 > #### hoodie.index.hbase.desired_puts_time_in_secs
 > <br></br>
 > **Default Value**: 600 (Optional)<br></br>
@@ -2914,8 +3307,16 @@ Configs that control locking mechanisms required for 
concurrency control  betwee
 
 ---
 
+> #### hoodie.write.lock.filesystem.expire
+> For DFS based lock providers, expire time in minutes, must be a nonnegative 
number, default means no expire<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: FILESYSTEM_LOCK_EXPIRE`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.write.lock.filesystem.path
-> For DFS based lock providers, path to store the locks under.<br></br>
+> For DFS based lock providers, path to store the locks under. use Table's 
meta path as default<br></br>
 > **Default Value**: N/A (Required)<br></br>
 > `Config Param: FILESYSTEM_LOCK_PATH`<br></br>
 > `Since Version: 0.8.0`<br></br>
@@ -2939,177 +3340,50 @@ Configs that control locking mechanisms required for 
concurrency control  betwee
 ---
 
 > #### hoodie.write.lock.conflict.resolution.strategy
-> Lock provider class name, this should be subclass of 
org.apache.hudi.client.transaction.ConflictResolutionStrategy<br></br>
-> **Default Value**: 
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy
 (Optional)<br></br>
-> `Config Param: WRITE_CONFLICT_RESOLUTION_STRATEGY_CLASS_NAME`<br></br>
-> `Since Version: 0.8.0`<br></br>
-
----
-
-> #### hoodie.write.lock.hivemetastore.database
-> For Hive based lock provider, the Hive database to acquire lock 
against<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: HIVE_DATABASE_NAME`<br></br>
-> `Since Version: 0.8.0`<br></br>
-
----
-
-> #### hoodie.write.lock.hivemetastore.uris
-> For Hive based lock provider, the Hive metastore URI to acquire locks 
against.<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: HIVE_METASTORE_URI`<br></br>
-> `Since Version: 0.8.0`<br></br>
-
----
-
-> #### hoodie.write.lock.max_wait_time_ms_between_retry
-> Maximum amount of time to wait between retries by lock provider client. This 
bounds the maximum delay from the exponential backoff. Currently used by ZK 
based lock provider only.<br></br>
-> **Default Value**: 5000 (Optional)<br></br>
-> `Config Param: LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS`<br></br>
-> `Since Version: 0.8.0`<br></br>
-
----
-
-> #### hoodie.write.lock.client.wait_time_ms_between_retry
-> Amount of time to wait between retries on the lock provider by the lock 
manager<br></br>
-> **Default Value**: 10000 (Optional)<br></br>
-> `Config Param: LOCK_ACQUIRE_CLIENT_RETRY_WAIT_TIME_IN_MILLIS`<br></br>
-> `Since Version: 0.8.0`<br></br>
-
----
-
-### Compaction Configs {#Compaction-Configs}
-
-Configurations that control compaction (merging of log files onto a new base 
files) as well as  cleaning (reclamation of older/unused file groups/slices).
-
-`Config Class`: org.apache.hudi.config.HoodieCompactionConfig<br></br>
-> #### hoodie.compaction.payload.class
-> This needs to be same as class used during insert/upserts. Just like 
writing, compaction also uses the record payload class to merge records in the 
log against each other, merge again with the base file and produce the final 
record to be written after compaction.<br></br>
-> **Default Value**: 
org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
-> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
-
----
-
-> #### hoodie.copyonwrite.record.size.estimate
-> The average record size. If not explicitly specified, hudi will compute the 
record size estimate compute dynamically based on commit metadata.  This is 
critical in computing the insert parallelism and bin-packing inserts into small 
files.<br></br>
-> **Default Value**: 1024 (Optional)<br></br>
-> `Config Param: COPY_ON_WRITE_RECORD_SIZE_ESTIMATE`<br></br>
-
----
-
-> #### hoodie.cleaner.policy
-> Cleaning policy to be used. The cleaner service deletes older file slices 
files to re-claim space. By default, cleaner spares the file slices written by 
the last N commits, determined by  hoodie.cleaner.commits.retained Long running 
query plans may often refer to older file slices and will break if those are 
cleaned, before the query has had   a chance to run. So, it is good to make 
sure that the data is retained for more than the maximum query execution 
time<br></br>
-> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
-> `Config Param: CLEANER_POLICY`<br></br>
-
----
-
-> #### hoodie.compact.inline.max.delta.seconds
-> Number of elapsed seconds after the last compaction, before scheduling a new 
one.<br></br>
-> **Default Value**: 3600 (Optional)<br></br>
-> `Config Param: INLINE_COMPACT_TIME_DELTA_SECONDS`<br></br>
-
----
-
-> #### hoodie.cleaner.delete.bootstrap.base.file
-> When set to true, cleaner also deletes the bootstrap base file when it's 
skeleton base file is  cleaned. Turn this to true, if you want to ensure the 
bootstrap dataset storage is reclaimed over time, as the table receives 
updates/deletes. Another reason to turn this on, would be to ensure data 
residing in bootstrap  base files are also physically deleted, to comply with 
data privacy enforcement processes.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: CLEANER_BOOTSTRAP_BASE_FILE_ENABLE`<br></br>
-
----
-
-> #### hoodie.archive.merge.enable
-> When enable, hoodie will auto merge several small archive files into larger 
one. It's useful when storage scheme doesn't support append operation.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: ARCHIVE_MERGE_ENABLE`<br></br>
-
----
-
-> #### hoodie.cleaner.commits.retained
-> Number of commits to retain, without cleaning. This will be retained for 
num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much data retention the table supports for incremental 
queries.<br></br>
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
-
----
-
-> #### hoodie.cleaner.policy.failed.writes
-> Cleaning policy for failed writes to be used. Hudi will delete any files 
written by failed writes to re-claim space. Choose to perform this rollback of 
failed writes eagerly before every writer starts (only supported for single 
writer) or lazily by the cleaner (required for multi-writers)<br></br>
-> **Default Value**: EAGER (Optional)<br></br>
-> `Config Param: FAILED_WRITES_CLEANER_POLICY`<br></br>
-
----
-
-> #### hoodie.compaction.logfile.size.threshold
-> Only if the log file size is greater than the threshold in bytes, the file 
group will be compacted.<br></br>
-> **Default Value**: 0 (Optional)<br></br>
-> `Config Param: COMPACTION_LOG_FILE_SIZE_THRESHOLD`<br></br>
-
----
-
-> #### hoodie.clean.async
-> Only applies when hoodie.clean.automatic is turned on. When turned on runs 
cleaner async with writing, which can speed up overall write 
performance.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: ASYNC_CLEAN`<br></br>
-
----
-
-> #### hoodie.clean.automatic
-> When enabled, the cleaner table service is invoked immediately after each 
commit, to delete older file slices. It's recommended to enable this, to ensure 
metadata and data storage growth is bounded.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: AUTO_CLEAN`<br></br>
-
----
-
-> #### hoodie.commits.archival.batch
-> Archiving of instants is batched in best-effort manner, to pack more 
instants into a single archive log. This config controls such archival batch 
size.<br></br>
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
-
----
-
-> #### hoodie.compaction.reverse.log.read
-> HoodieLogFormatReader reads a logfile in the forward direction starting from 
pos=0 to pos=file_length. If this config is set to true, the reader reads the 
logfile in reverse direction, from pos=file_length to pos=0<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: COMPACTION_REVERSE_LOG_READ_ENABLE`<br></br>
+> Lock provider class name, this should be subclass of 
org.apache.hudi.client.transaction.ConflictResolutionStrategy<br></br>
+> **Default Value**: 
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy
 (Optional)<br></br>
+> `Config Param: WRITE_CONFLICT_RESOLUTION_STRATEGY_CLASS_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
 
 ---
 
-> #### hoodie.clean.allow.multiple
-> Allows scheduling/executing multiple cleans by enabling this config. If 
users prefer to strictly ensure clean requests should be mutually exclusive, 
.i.e. a 2nd clean will not be scheduled if another clean is not yet completed 
to avoid repeat cleaning of same files, they might want to disable this 
config.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: ALLOW_MULTIPLE_CLEANS`<br></br>
-> `Since Version: 0.11.0`<br></br>
+> #### hoodie.write.lock.hivemetastore.database
+> For Hive based lock provider, the Hive database to acquire lock 
against<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_DATABASE_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
 
 ---
 
-> #### hoodie.archive.merge.small.file.limit.bytes
-> This config sets the archive file size limit below which an archive file 
becomes a candidate to be selected as such a small file.<br></br>
-> **Default Value**: 20971520 (Optional)<br></br>
-> `Config Param: ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES`<br></br>
+> #### hoodie.write.lock.hivemetastore.uris
+> For Hive based lock provider, the Hive metastore URI to acquire locks 
against.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_METASTORE_URI`<br></br>
+> `Since Version: 0.8.0`<br></br>
 
 ---
 
-> #### hoodie.cleaner.fileversions.retained
-> When KEEP_LATEST_FILE_VERSIONS cleaning policy is used,  the minimum number 
of file slices to retain in each file group, during cleaning.<br></br>
-> **Default Value**: 3 (Optional)<br></br>
-> `Config Param: CLEANER_FILE_VERSIONS_RETAINED`<br></br>
+> #### hoodie.write.lock.max_wait_time_ms_between_retry
+> Maximum amount of time to wait between retries by lock provider client. This 
bounds the maximum delay from the exponential backoff. Currently used by ZK 
based lock provider only.<br></br>
+> **Default Value**: 5000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS`<br></br>
+> `Since Version: 0.8.0`<br></br>
 
 ---
 
-> #### hoodie.compact.inline
-> When set to true, compaction service is triggered after each write. While 
being  simpler operationally, this adds extra latency on the write 
path.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: INLINE_COMPACT`<br></br>
+> #### hoodie.write.lock.client.wait_time_ms_between_retry
+> Amount of time to wait between retries on the lock provider by the lock 
manager<br></br>
+> **Default Value**: 10000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_CLIENT_RETRY_WAIT_TIME_IN_MILLIS`<br></br>
+> `Since Version: 0.8.0`<br></br>
 
 ---
 
-> #### hoodie.clean.max.commits
-> Number of commits after the last clean operation, before scheduling of a new 
clean is attempted.<br></br>
-> **Default Value**: 1 (Optional)<br></br>
-> `Config Param: CLEAN_MAX_COMMITS`<br></br>
+### Compaction Configs {#Compaction-Configs}
 
----
+Configurations that control compaction (merging of log files onto a new base 
files).
 
+`Config Class`: org.apache.hudi.config.HoodieCompactionConfig<br></br>
 > #### hoodie.compaction.lazy.block.read
 > When merging the delta log files, this config helps to choose whether the 
 > log blocks should be read lazily or not. Choose true to use lazy block 
 > reading (low memory usage, but incurs seeks to each block header) or false 
 > for immediate block read (higher memory usage)<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -3117,21 +3391,6 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
-> #### hoodie.archive.merge.files.batch.size
-> The number of small archive files to be merged at once.<br></br>
-> **Default Value**: 10 (Optional)<br></br>
-> `Config Param: ARCHIVE_MERGE_FILES_BATCH_SIZE`<br></br>
-
----
-
-> #### hoodie.archive.async
-> Only applies when hoodie.archive.automatic is turned on. When turned on runs 
archiver async with writing, which can speed up overall write 
performance.<br></br>
-> **Default Value**: false (Optional)<br></br>
-> `Config Param: ASYNC_ARCHIVE`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
 > #### hoodie.parquet.small.file.limit
 > During upsert operation, we opportunistically expand existing small files on 
 > storage, instead of writing new files, to keep number of files to an 
 > optimum. This config sets the file size limit below which a file on storage  
 > becomes a candidate to be selected as such a `small file`. By default, treat 
 > any file <= 100MB as a small file. Also note that if this set <= 0, will not 
 > try to get small files and directly write new files<br></br>
 > **Default Value**: 104857600 (Optional)<br></br>
@@ -3146,10 +3405,17 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
-> #### hoodie.cleaner.hours.retained
-> Number of hours for which commits need to be retained. This config provides 
a more flexible option ascompared to number of commits retained for cleaning 
service. Setting this property ensures all the files, but the latest in a file 
group, corresponding to commits with commit times older than the configured 
number of hours to be retained are cleaned.<br></br>
-> **Default Value**: 24 (Optional)<br></br>
-> `Config Param: CLEANER_HOURS_RETAINED`<br></br>
+> #### hoodie.copyonwrite.record.size.estimate
+> The average record size. If not explicitly specified, hudi will compute the 
record size estimate compute dynamically based on commit metadata.  This is 
critical in computing the insert parallelism and bin-packing inserts into small 
files.<br></br>
+> **Default Value**: 1024 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_RECORD_SIZE_ESTIMATE`<br></br>
+
+---
+
+> #### hoodie.compact.inline.max.delta.seconds
+> Number of elapsed seconds after the last compaction, before scheduling a new 
one.<br></br>
+> **Default Value**: 3600 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TIME_DELTA_SECONDS`<br></br>
 
 ---
 
@@ -3160,17 +3426,10 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
-> #### hoodie.archive.automatic
-> When enabled, the archival table service is invoked immediately after each 
commit, to archive commits if we cross a maximum value of commits. It's 
recommended to enable this, to ensure number of active commits is 
bounded.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: AUTO_ARCHIVE`<br></br>
-
----
-
-> #### hoodie.clean.trigger.strategy
-> Controls how cleaning is scheduled. Valid options: NUM_COMMITS<br></br>
-> **Default Value**: NUM_COMMITS (Optional)<br></br>
-> `Config Param: CLEAN_TRIGGER_STRATEGY`<br></br>
+> #### hoodie.compaction.logfile.size.threshold
+> Only if the log file size is greater than the threshold in bytes, the file 
group will be compacted.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: COMPACTION_LOG_FILE_SIZE_THRESHOLD`<br></br>
 
 ---
 
@@ -3196,27 +3455,6 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
-> #### hoodie.keep.min.commits
-> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.<br></br>
-> **Default Value**: 20 (Optional)<br></br>
-> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
-
----
-
-> #### hoodie.cleaner.parallelism
-> Parallelism for the cleaning operation. Increase this if cleaning becomes 
slow.<br></br>
-> **Default Value**: 200 (Optional)<br></br>
-> `Config Param: CLEANER_PARALLELISM_VALUE`<br></br>
-
----
-
-> #### hoodie.cleaner.incremental.mode
-> When enabled, the plans for each cleaner service run is computed 
incrementally off the events  in the timeline, since the last cleaner run. This 
is much more efficient than obtaining listings for the full table for each 
planning (even with a metadata table).<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: CLEANER_INCREMENTAL_MODE_ENABLE`<br></br>
-
----
-
 > #### hoodie.record.size.estimation.threshold
 > We use the previous commits' metadata to calculate the estimated record size 
 > and use it  to bin pack records into partitions. If the previous commit is 
 > too small to make an accurate estimation,  Hudi will search commits in the 
 > reverse order, until we find a commit that has totalBytesWritten  larger 
 > than (PARQUET_SMALL_FILE_LIMIT_BYTES * this_threshold)<br></br>
 > **Default Value**: 1.0 (Optional)<br></br>
@@ -3225,23 +3463,16 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 > #### hoodie.compact.inline.trigger.strategy
-> Controls how compaction scheduling is triggered, by time or num delta 
commits or combination of both. Valid options: 
NUM_COMMITS,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
+> Controls how compaction scheduling is triggered, by time or num delta 
commits or combination of both. Valid options: 
NUM_COMMITS,NUM_COMMITS_AFTER_LAST_REQUEST,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
 > **Default Value**: NUM_COMMITS (Optional)<br></br>
 > `Config Param: INLINE_COMPACT_TRIGGER_STRATEGY`<br></br>
 
 ---
 
-> #### hoodie.keep.max.commits
-> Archiving service moves older entries from timeline into an archived log 
after each write, to  keep the metadata overhead constant, even as the table 
size grows.This config controls the maximum number of instants to retain in the 
active timeline. <br></br>
-> **Default Value**: 30 (Optional)<br></br>
-> `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
-
----
-
-> #### hoodie.archive.delete.parallelism
-> Parallelism for deleting archived hoodie commits.<br></br>
-> **Default Value**: 100 (Optional)<br></br>
-> `Config Param: DELETE_ARCHIVED_INSTANT_PARALLELISM_VALUE`<br></br>
+> #### hoodie.compaction.reverse.log.read
+> HoodieLogFormatReader reads a logfile in the forward direction starting from 
pos=0 to pos=file_length. If this config is set to true, the reader reads the 
logfile in reverse direction, from pos=file_length to pos=0<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMPACTION_REVERSE_LOG_READ_ENABLE`<br></br>
 
 ---
 
@@ -3266,11 +3497,34 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
+> #### hoodie.compact.inline
+> When set to true, compaction service is triggered after each write. While 
being  simpler operationally, this adds extra latency on the write 
path.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_COMPACT`<br></br>
+
+---
+
 ### File System View Storage Configurations 
{#File-System-View-Storage-Configurations}
 
 Configurations that control how file metadata is stored by Hudi, for 
transaction processing and queries.
 
 `Config Class`: 
org.apache.hudi.common.table.view.FileSystemViewStorageConfig<br></br>
+> #### hoodie.filesystem.view.remote.retry.exceptions
+> The class name of the Exception that needs to be re-tryed, separated by 
commas. Default is empty which means retry all the IOException and 
RuntimeException from Remote Request.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: RETRY_EXCEPTIONS`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.remote.retry.initial_interval_ms
+> Amount of time (in ms) to wait, before retry to do operations on 
storage.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: REMOTE_INITIAL_RETRY_INTERVAL_MS`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.filesystem.view.spillable.replaced.mem.fraction
 > Fraction of the file system view memory, to be used for holding replace 
 > commit related metadata.<br></br>
 > **Default Value**: 0.01 (Optional)<br></br>
@@ -3299,6 +3553,14 @@ Configurations that control how file metadata is stored 
by Hudi, for transaction
 
 ---
 
+> #### hoodie.filesystem.view.remote.retry.max_numbers
+> Maximum number of retry for API requests against a remote file system view. 
e.g timeline server.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: REMOTE_MAX_RETRY_NUMBERS`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.filesystem.view.spillable.mem
 > Amount of memory to be used in bytes for holding file system view, before 
 > spilling to disk.<br></br>
 > **Default Value**: 104857600 (Optional)<br></br>
@@ -3313,6 +3575,14 @@ Configurations that control how file metadata is stored 
by Hudi, for transaction
 
 ---
 
+> #### hoodie.filesystem.view.remote.retry.enable
+> Whether to enable API request retry for remote file system view.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: REMOTE_RETRY_ENABLE`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.filesystem.view.remote.host
 > We expect this to be rarely hand configured.<br></br>
 > **Default Value**: localhost (Optional)<br></br>
@@ -3320,6 +3590,14 @@ Configurations that control how file metadata is stored 
by Hudi, for transaction
 
 ---
 
+> #### hoodie.filesystem.view.remote.retry.max_interval_ms
+> Maximum amount of time (in ms), to wait for next retry.<br></br>
+> **Default Value**: 2000 (Optional)<br></br>
+> `Config Param: REMOTE_MAX_RETRY_INTERVAL_MS`<br></br>
+> `Since Version: 0.12.0`<br></br>
+
+---
+
 > #### hoodie.filesystem.view.type
 > File system view provides APIs for viewing the files on the underlying lake 
 > storage,  as file groups and file slices. This config controls how such a 
 > view is held. Options include 
 > MEMORY,SPILLABLE_DISK,EMBEDDED_KV_STORE,REMOTE_ONLY,REMOTE_FIRST which 
 > provide different trade offs for memory usage and API request 
 > performance.<br></br>
 > **Default Value**: MEMORY (Optional)<br></br>
@@ -3417,7 +3695,7 @@ Configurations that control indexing behavior, which tags 
incoming records as ei
 ---
 
 > #### hoodie.bucket.index.num.buckets
-> Only applies if index type is BUCKET_INDEX. Determine the number of buckets 
in the hudi table, and each partition is divided to N buckets.<br></br>
+> Only applies if index type is BUCKET. Determine the number of buckets in the 
hudi table, and each partition is divided to N buckets.<br></br>
 > **Default Value**: 256 (Optional)<br></br>
 > `Config Param: BUCKET_INDEX_NUM_BUCKETS`<br></br>
 
@@ -3494,6 +3772,14 @@ Configurations that control indexing behavior, which 
tags incoming records as ei
 
 ---
 
+> #### hoodie.index.bucket.engine
+> Type of bucket index engine to use. Default is SIMPLE bucket index, with 
fixed number of bucket.Possible options are [SIMPLE | 
CONSISTENT_HASHING].Consistent hashing supports dynamic resizing of the number 
of bucket, solving potential data skew and file size issues of the SIMPLE 
hashing engine.<br></br>
+> **Default Value**: SIMPLE (Optional)<br></br>
+> `Config Param: BUCKET_INDEX_ENGINE_TYPE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 > #### hoodie.bloom.index.filter.dynamic.max.entries
 > The threshold for the maximum number of keys to record in a dynamic Bloom 
 > filter row. Only applies if filter type is 
 > BloomFilterTypeCode.DYNAMIC_V0.<br></br>
 > **Default Value**: 100000 (Optional)<br></br>
@@ -3763,6 +4049,13 @@ The following set of configurations are common across 
Hudi.
 
 ---
 
+> #### hoodie.datasource.write.reconcile.schema
+> When a new batch of write has records with old schema, but latest table 
schema got evolved, this config will upgrade the records to leverage latest 
table schema(default values will be injected to missing fields). If not, the 
write batch would fail.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: RECONCILE_SCHEMA`<br></br>
+
+---
+
 > #### hoodie.common.spillable.diskmap.type
 > When handling input data that cannot be held in memory, to merge with a file 
 > on storage, a spillable diskmap is employed.  By default, we use a 
 > persistent hashmap based loosely on bitcask, that offers O(1) inserts, 
 > lookups. Change this to `ROCKS_DB` to prefer using rocksDB, for handling the 
 > spill.<br></br>
 > **Default Value**: BITCASK (Optional)<br></br>
@@ -3770,6 +4063,13 @@ The following set of configurations are common across 
Hudi.
 
 ---
 
+> #### hoodie.schema.on.read.enable
+> Enables support for Schema Evolution feature<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEMA_EVOLUTION_ENABLE`<br></br>
+
+---
+
 ### Bootstrap Configs {#Bootstrap-Configs}
 
 Configurations that control how you want to bootstrap your existing tables for 
the first time into hudi. The bootstrap operation can flexibly avoid copying 
data over before you can use Hudi and support running the existing  writers and 
new hudi writers in parallel, to validate the migration.
@@ -3935,6 +4235,43 @@ Enables reporting on Hudi metrics using the Datadog 
reporter type. Hudi publishe
 
 ---
 
+### Metrics Configurations for Amazon CloudWatch 
{#Metrics-Configurations-for-Amazon-CloudWatch}
+
+Enables reporting on Hudi metrics using Amazon CloudWatch.  Hudi publishes 
metrics on every commit, clean, rollback etc.
+
+`Config Class`: 
org.apache.hudi.config.metrics.HoodieMetricsCloudWatchConfig<br></br>
+> #### hoodie.metrics.cloudwatch.report.period.seconds
+> Reporting interval in seconds<br></br>
+> **Default Value**: 60 (Optional)<br></br>
+> `Config Param: REPORT_PERIOD_SECONDS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.namespace
+> Namespace of reporter<br></br>
+> **Default Value**: Hudi (Optional)<br></br>
+> `Config Param: METRIC_NAMESPACE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.metric.prefix
+> Metric prefix of reporter<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: METRIC_PREFIX`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.maxDatumsPerRequest
+> Max number of Datums per request<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MAX_DATUMS_PER_REQUEST`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
 ### Metrics Configurations {#Metrics-Configurations}
 
 Enables reporting on Hudi metrics. Hudi publishes metrics on every commit, 
clean, rollback etc. The following sections list the supported reporters.
@@ -4062,43 +4399,6 @@ Enables reporting on Hudi metrics using Prometheus.  
Hudi publishes metrics on e
 
 ---
 
-### Metrics Configurations for Amazon CloudWatch 
{#Metrics-Configurations-for-Amazon-CloudWatch}
-
-Enables reporting on Hudi metrics using Amazon CloudWatch.  Hudi publishes 
metrics on every commit, clean, rollback etc.
-
-`Config Class`: org.apache.hudi.config.HoodieMetricsCloudWatchConfig<br></br>
-> #### hoodie.metrics.cloudwatch.report.period.seconds
-> Reporting interval in seconds<br></br>
-> **Default Value**: 60 (Optional)<br></br>
-> `Config Param: REPORT_PERIOD_SECONDS`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
-> #### hoodie.metrics.cloudwatch.namespace
-> Namespace of reporter<br></br>
-> **Default Value**: Hudi (Optional)<br></br>
-> `Config Param: METRIC_NAMESPACE`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
-> #### hoodie.metrics.cloudwatch.metric.prefix
-> Metric prefix of reporter<br></br>
-> **Default Value**:  (Optional)<br></br>
-> `Config Param: METRIC_PREFIX`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
-> #### hoodie.metrics.cloudwatch.maxDatumsPerRequest
-> Max number of Datums per request<br></br>
-> **Default Value**: 20 (Optional)<br></br>
-> `Config Param: MAX_DATUMS_PER_REQUEST`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
 ### Metrics Configurations for Graphite {#Metrics-Configurations-for-Graphite}
 
 Enables reporting on Hudi metrics using Graphite.  Hudi publishes metrics on 
every commit, clean, rollback etc.
@@ -4144,6 +4444,13 @@ This is the lowest level of customization offered by 
Hudi. Record payloads defin
 Payload related configs, that can be leveraged to control merges based on 
specific business fields in the data.
 
 `Config Class`: org.apache.hudi.config.HoodiePayloadConfig<br></br>
+> #### hoodie.compaction.payload.class
+> This needs to be same as class used during insert/upserts. Just like 
writing, compaction also uses the record payload class to merge records in the 
log against each other, merge again with the base file and produce the final 
record to be written after compaction.<br></br>
+> **Default Value**: 
org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
 > #### hoodie.payload.event.time.field
 > Table column/field name to derive timestamp associated with the records. 
 > This canbe useful for e.g, determining the freshness of the table.<br></br>
 > **Default Value**: ts (Optional)<br></br>
diff --git a/website/docs/writing_data.md b/website/docs/writing_data.md
index 3c0a516e2c..27ef443d95 100644
--- a/website/docs/writing_data.md
+++ b/website/docs/writing_data.md
@@ -43,7 +43,7 @@ Available values:<br/>
 
 **HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY**: If using hive, specify if the 
table should or should not be partitioned.<br/>
 Available values:<br/>
-`classOf[SlashEncodedDayPartitionValueExtractor].getCanonicalName` (default), 
`classOf[MultiPartKeysValueExtractor].getCanonicalName`, 
`classOf[TimestampBasedKeyGenerator].getCanonicalName`, 
`classOf[NonPartitionedExtractor].getCanonicalName`, 
`classOf[GlobalDeleteKeyGenerator].getCanonicalName` (to be used when 
`OPERATION_OPT_KEY` is set to `DELETE_OPERATION_OPT_VAL`)
+`classOf[MultiPartKeysValueExtractor].getCanonicalName` (default), 
`classOf[SlashEncodedDayPartitionValueExtractor].getCanonicalName`, 
`classOf[TimestampBasedKeyGenerator].getCanonicalName`, 
`classOf[NonPartitionedExtractor].getCanonicalName`, 
`classOf[GlobalDeleteKeyGenerator].getCanonicalName` (to be used when 
`OPERATION_OPT_KEY` is set to `DELETE_OPERATION_OPT_VAL`)
 
 
 Example:

[hudi] branch asf-site updated: [HUDI-4566] Document configuration updates (#6381)

Reply via email to