[hudi] branch asf-site updated: [Docs] Improving Hudi Configurations docs (#3145)

leesf Mon, 28 Jun 2021 03:11:52 -0700

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 561e857  [Docs] Improving Hudi Configurations docs (#3145)
561e857 is described below

commit 561e857423d56efef679362c71aeb98235d1e9e4
Author: zhangyue19921010 <[email protected]>
AuthorDate: Mon Jun 28 18:11:15 2021 +0800

    [Docs] Improving Hudi Configurations docs (#3145)
    
    Co-authored-by: yuezhang <[email protected]>
---
 docs/_docs/2_4_configurations.md   | 962 ++++++++++++-------------------------
 docs/_sass/hudi_style/_tables.scss |   4 +
 2 files changed, 308 insertions(+), 658 deletions(-)

diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index 1a8121a..88e9898 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -43,139 +43,52 @@ inputDF.write()
 
 Options useful for writing tables via `write.format.option(...)`
 
-#### TABLE_NAME_OPT_KEY {#TABLE_NAME_OPT_KEY}
-  Property: `hoodie.datasource.write.table.name` [Required]<br/>
-  <span style="color:grey">Hive table name, to register the table into.</span>
-  
-#### OPERATION_OPT_KEY {#OPERATION_OPT_KEY}
-  Property: `hoodie.datasource.write.operation`, Default: `upsert`<br/>
-  <span style="color:grey">whether to do upsert, insert or bulkinsert for the 
write operation. Use `bulkinsert` to load new data into a table, and there on 
use `upsert`/`insert`. 
-  bulk insert uses a disk based write path to scale to load large inputs 
without need to cache it.</span>
-  
-#### TABLE_TYPE_OPT_KEY {#TABLE_TYPE_OPT_KEY}
-  Property: `hoodie.datasource.write.table.type`, Default: `COPY_ON_WRITE` 
<br/>
-  <span style="color:grey">The table type for the underlying data, for this 
write. This can't change between writes.</span>
-  
-#### PRECOMBINE_FIELD_OPT_KEY {#PRECOMBINE_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.precombine.field`, Default: `ts` <br/>
-  <span style="color:grey">Field used in preCombining before actual write. 
When two records have the same key value,
-we will pick the one with the largest value for the precombine field, 
determined by Object.compareTo(..)</span>
-
-#### PAYLOAD_CLASS_OPT_KEY {#PAYLOAD_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.payload.class`, Default: 
`org.apache.hudi.OverwriteWithLatestAvroPayload` <br/>
-  <span style="color:grey">Payload class used. Override this, if you like to 
roll your own merge logic, when upserting/inserting. 
-  This will render any value set for `PRECOMBINE_FIELD_OPT_VAL` 
in-effective</span>
-  
-#### RECORDKEY_FIELD_OPT_KEY {#RECORDKEY_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.recordkey.field`, Default: `uuid` <br/>
-  <span style="color:grey">Record key field. Value to be used as the 
`recordKey` component of `HoodieKey`. Actual value
-will be obtained by invoking .toString() on the field value. Nested fields can 
be specified using
-the dot notation eg: `a.b.c`</span>
-
-#### PARTITIONPATH_FIELD_OPT_KEY {#PARTITIONPATH_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.partitionpath.field`, Default: 
`partitionpath` <br/>
-  <span style="color:grey">Partition path field. Value to be used at the 
`partitionPath` component of `HoodieKey`.
-Actual value ontained by invoking .toString()</span>
-
-#### HIVE_STYLE_PARTITIONING_OPT_KEY {#HIVE_STYLE_PARTITIONING_OPT_KEY}
-  Property: `hoodie.datasource.write.hive_style_partitioning`, Default: 
`false` <br/>
-  <span style="color:grey">When set to true, partition folder names follow the 
format of Hive partitions: <partition_column_name>=<partition_value></span>
-
-#### KEYGENERATOR_CLASS_OPT_KEY {#KEYGENERATOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.keygenerator.class`, Default: 
`org.apache.hudi.keygen.SimpleKeyGenerator` <br/>
-  <span style="color:grey">Key generator class, that implements will extract 
the key out of incoming `Row` object</span>
-  
-#### COMMIT_METADATA_KEYPREFIX_OPT_KEY {#COMMIT_METADATA_KEYPREFIX_OPT_KEY}
-  Property: `hoodie.datasource.write.commitmeta.key.prefix`, Default: `_` <br/>
-  <span style="color:grey">Option keys beginning with this prefix, are 
automatically added to the commit/deltacommit metadata.
-This is useful to store checkpointing information, in a consistent way with 
the hudi timeline</span>
-
-#### INSERT_DROP_DUPS_OPT_KEY {#INSERT_DROP_DUPS_OPT_KEY}
-  Property: `hoodie.datasource.write.insert.drop.duplicates`, Default: `false` 
<br/>
-  <span style="color:grey">If set to true, filters out all duplicate records 
from incoming dataframe, during insert operations. </span>
-
-#### ENABLE_ROW_WRITER_OPT_KEY {#ENABLE_ROW_WRITER_OPT_KEY}
-  Property: `hoodie.datasource.write.row.writer.enable`, Default: `false` <br/>
-  <span style="color:grey">When set to true, will perform write operations 
directly using the spark native `Row` 
-  representation. This is expected to be faster by 20 to 30% than regular 
bulk_insert by setting this config</span>
-
-#### HIVE_SYNC_ENABLED_OPT_KEY {#HIVE_SYNC_ENABLED_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.enable`, Default: `false` <br/>
-  <span style="color:grey">When set to true, register/sync the table to Apache 
Hive metastore</span>
-
-#### HIVE_DATABASE_OPT_KEY {#HIVE_DATABASE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.database`, Default: `default` <br/>
-  <span style="color:grey">database to sync to</span>
-  
-#### HIVE_TABLE_OPT_KEY {#HIVE_TABLE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.table`, [Required] <br/>
-  <span style="color:grey">table to sync to</span>
-  
-#### HIVE_USER_OPT_KEY {#HIVE_USER_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.username`, Default: `hive` <br/>
-  <span style="color:grey">hive user name to use</span>
-  
-#### HIVE_PASS_OPT_KEY {#HIVE_PASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.password`, Default: `hive` <br/>
-  <span style="color:grey">hive password to use</span>
-  
-#### HIVE_URL_OPT_KEY {#HIVE_URL_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.jdbcurl`, Default: 
`jdbc:hive2://localhost:10000` <br/>
-  <span style="color:grey">Hive metastore url</span>
-  
-#### HIVE_PARTITION_FIELDS_OPT_KEY {#HIVE_PARTITION_FIELDS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_fields`, Default: ` ` <br/>
-  <span style="color:grey">field in the table to use for determining hive 
partition columns.</span>
-  
-#### HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY 
{#HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_extractor_class`, Default: 
`org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor` <br/>
-  <span style="color:grey">Class used to extract partition field values into 
hive partition columns.</span>
-  
-#### HIVE_ASSUME_DATE_PARTITION_OPT_KEY {#HIVE_ASSUME_DATE_PARTITION_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.assume_date_partitioning`, Default: 
`false` <br/>
-  <span style="color:grey">Assume partitioning is yyyy/mm/dd</span>
-  
-#### HIVE_USE_JDBC_OPT_KEY {#HIVE_USE_JDBC_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.use_jdbc`, Default: `true` <br/>
-  <span style="color:grey">Use JDBC when hive synchronization is enabled</span>
-
-#### HIVE_AUTO_CREATE_DATABASE_OPT_KEY {#HIVE_AUTO_CREATE_DATABASE_OPT_KEY}
-Property: `hoodie.datasource.hive_sync.auto_create_database` Default: `true` 
<br/>
-<span style="color:grey"> Auto create hive database if does not exists. 
<b>Note</b>: for versions 0.7 and 0.8 you will have to explicitly set this to 
true </span>
-
-#### HIVE_SKIP_RO_SUFFIX {#HIVE_SKIP_RO_SUFFIX}
-Property: `hoodie.datasource.hive_sync.skip_ro_suffix` Default: `false` <br/>
-<span style="color:grey"> Skip the `_ro` suffix for Read optimized table, when 
registering</span>
-
-#### HIVE_SUPPORT_TIMESTAMP {#HIVE_SUPPORT_TIMESTAMP}
-Property: `hoodie.datasource.hive_sync.support_timestamp` Default: `false` 
<br/>
-<span style="color:grey"> 'INT64' with original type TIMESTAMP_MICROS is 
converted to hive 'timestamp' type. Disabled by default for backward 
compatibility. </span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| TABLE_NAME_OPT_KEY | hoodie.datasource.write.table.name | YES | N/A | Hive 
table name, to register the table into. |
+| OPERATION_OPT_KEY | hoodie.datasource.write.operation | NO | upsert | 
Whether to do upsert, insert or bulkinsert for the write operation. Use 
bulkinsert to load new data into a table, and there on use upsert/insert. Bulk 
insert uses a disk based write path to scale to load large inputs without need 
to cache it. |
+| TABLE_TYPE_OPT_KEY | hoodie.datasource.write.table.type | NO | COPY_ON_WRITE 
| The table type for the underlying data, for this write. This can’t change 
between writes. |
+| PRECOMBINE_FIELD_OPT_KEY | hoodie.datasource.write.precombine.field | NO | 
ts | Field used in preCombining before actual write. When two records have the 
same key value, we will pick the one with the largest value for the precombine 
field, determined by Object.compareTo(..) |
+| PAYLOAD_CLASS_OPT_KEY | hoodie.datasource.write.payload.class | NO | 
org.apache.hudi.OverwriteWithLatestAvroPayload | Payload class used. Override 
this, if you like to roll your own merge logic, when upserting/inserting. This 
will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective |
+| RECORDKEY_FIELD_OPT_KEY | hoodie.datasource.write.recordkey.field | NO | 
uuid | Record key field. Value to be used as the recordKey component of 
HoodieKey. Actual value will be obtained by invoking .toString() on the field 
value. Nested fields can be specified using the dot notation eg: a.b.c |
+| PARTITIONPATH_FIELD_OPT_KEY | hoodie.datasource.write.partitionpath.field | 
NO | partitionpath | Partition path field. Value to be used at the 
partitionPath component of HoodieKey. Actual value ontained by invoking 
.toString() |
+| HIVE_STYLE_PARTITIONING_OPT_KEY | 
hoodie.datasource.write.hive_style_partitioning | NO | false | If set true, the 
names of partition folders follow <partition_column_name>=<partition_value> 
format. |
+| KEYGENERATOR_CLASS_OPT_KEY | hoodie.datasource.write.keygenerator.class | NO 
| org.apache.hudi.keygen.SimpleKeyGenerator | Key generator class, that 
implements will extract the key out of incoming Row object. |
+| COMMIT_METADATA_KEYPREFIX_OPT_KEY | 
hoodie.datasource.write.commitmeta.key.prefix | NO | _ | Option keys beginning 
with this prefix, are automatically added to the commit/deltacommit metadata. 
This is useful to store checkpointing information, in a consistent way with the 
hudi timeline. |
+| INSERT_DROP_DUPS_OPT_KEY | hoodie.datasource.write.insert.drop.duplicates | 
NO | false | If set to true, filters out all duplicate records from incoming 
dataframe, during insert operations. |
+| ENABLE_ROW_WRITER_OPT_KEY | hoodie.datasource.write.row.writer.enable | NO | 
false | When set to true, will perform write operations directly using the 
spark native Row representation. This is expected to be faster by 20 to 30% 
than regular bulk_insert by setting this config. |
+| HIVE_SYNC_ENABLED_OPT_KEY | hoodie.datasource.hive_sync.enable | NO | false 
| When set to true, register/sync the table to Apache Hive metastore. |
+| HIVE_DATABASE_OPT_KEY | hoodie.datasource.hive_sync.database | NO | default 
| Database to sync to. |
+| HIVE_TABLE_OPT_KEY | hoodie.datasource.hive_sync.table | YES | N/A | Table 
to sync to. |
+| HIVE_USER_OPT_KEY | hoodie.datasource.hive_sync.username | NO | hive | Hive 
user name to use. |
+| HIVE_PASS_OPT_KEY | hoodie.datasource.hive_sync.password | NO | hive | Hive 
password to use. |
+| HIVE_URL_OPT_KEY | hoodie.datasource.hive_sync.jdbcurl | NO | 
jdbc:hive2://localhost:10000 | Hive metastore url. |
+| HIVE_PARTITION_FIELDS_OPT_KEY | hoodie.datasource.hive_sync.partition_fields 
| NO |   | Field in the table to use for determining hive partition columns. |
+| HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY | 
hoodie.datasource.hive_sync.partition_extractor_class | NO | 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor | Class used to 
extract partition field values into hive partition columns. |
+| HIVE_ASSUME_DATE_PARTITION_OPT_KEY | 
hoodie.datasource.hive_sync.assume_date_partitioning | NO | false | Assume 
partitioning is yyyy/mm/dd . |
+| HIVE_USE_JDBC_OPT_KEY | hoodie.datasource.hive_sync.use_jdbc | NO | true | 
Use JDBC when hive synchronization is enabled. |
+| HIVE_AUTO_CREATE_DATABASE_OPT_KEY | 
hoodie.datasource.hive_sync.auto_create_database | NO | true | Auto create hive 
database if does not exists. Note: for versions 0.7 and 0.8 you will have to 
explicitly set this to true. |
+| HIVE_SKIP_RO_SUFFIX | hoodie.datasource.hive_sync.skip_ro_suffix | NO | 
false | Skip the _ro suffix for Read optimized table, when registering. |
+| HIVE_SUPPORT_TIMESTAMP | hoodie.datasource.hive_sync.support_timestamp | NO 
| false | ‘INT64’ with original type TIMESTAMP_MICROS is converted to hive 
‘timestamp’ type. Disabled by default for backward compatibility. |
+
+</div>
 
 ### Read Options
 
 Options useful for reading tables via `read.format.option(...)`
 
-#### QUERY_TYPE_OPT_KEY {#QUERY_TYPE_OPT_KEY}
-Property: `hoodie.datasource.query.type`, Default: `snapshot` <br/>
-<span style="color:grey">Whether data needs to be read, in incremental mode 
(new data since an instantTime)
-(or) Read Optimized mode (obtain latest view, based on columnar data)
-(or) Snapshot mode (obtain latest view, based on row & columnar data)</span>
-
-#### BEGIN_INSTANTTIME_OPT_KEY {#BEGIN_INSTANTTIME_OPT_KEY} 
-Property: `hoodie.datasource.read.begin.instanttime`, [Required in incremental 
mode] <br/>
-<span style="color:grey">Instant time to start incrementally pulling data 
from. The instanttime here need not
-necessarily correspond to an instant on the timeline. New data written with an
- `instant_time > BEGIN_INSTANTTIME` are fetched out. For e.g: '20170901080000' 
will get
- all new data written after Sep 1, 2017 08:00AM.</span>
- 
-#### END_INSTANTTIME_OPT_KEY {#END_INSTANTTIME_OPT_KEY}
-Property: `hoodie.datasource.read.end.instanttime`, Default: latest instant 
(i.e fetches all new data since begin instant time) <br/>
-<span style="color:grey"> Instant time to limit incrementally fetched data to. 
New data written with an
-`instant_time <= END_INSTANTTIME` are fetched out.</span>
-
-#### INCREMENTAL_READ_SCHEMA_USE_END_INSTANTTIME_OPT_KEY 
{#INCREMENTAL_READ_SCHEMA_USE_END_INSTANTTIME_OPT_KEY}
-Property: `hoodie.datasource.read.schema.use.end.instanttime`, Default: false 
<br/>
-<span style="color:grey"> Uses end instant schema when incrementally fetched 
data to. Default: users latest instant schema. </span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| QUERY_TYPE_OPT_KEY | hoodie.datasource.query.type | NO | snapshot | Whether 
data needs to be read, in incremental mode (new data since an instantTime) (or) 
Read Optimized mode (obtain latest view, based on columnar data) (or) Snapshot 
mode (obtain latest view, based on row & columnar data). |
+| BEGIN_INSTANTTIME_OPT_KEY | hoodie.datasource.read.begin.instanttime | 
Required in incremental mode | N/A | Instant time to start incrementally 
pulling data from. The instanttime here need not necessarily correspond to an 
instant on the timeline. New data written with an instant_time > 
BEGIN_INSTANTTIME are fetched out. For e.g: ‘20170901080000’ will get all new 
data written after Sep 1, 2017 08:00AM. |
+| END_INSTANTTIME_OPT_KEY | hoodie.datasource.read.end.instanttime | NO | 
latest instant (i.e fetches all new data since begin instant time) | Instant 
time to limit incrementally fetched data to. New data written with an 
instant_time <= END_INSTANTTIME are fetched out. |
+| INCREMENTAL_READ_SCHEMA_USE_END_INSTANTTIME_OPT_KEY | 
hoodie.datasource.read.schema.use.end.instanttime | NO | false | Uses end 
instant schema when incrementally fetched data to. Default: users latest 
instant schema. |
+
+</div>
 
 ## Flink SQL Config Options {#flink-options}
 
@@ -184,6 +97,8 @@ The actual datasource level configs are listed below.
 
 ### Write Options
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `path` | Y | N/A | <span style="color:grey"> Base path for the target hoodie 
table. The path would be created if it does not exist, otherwise a hudi table 
expects to be initialized successfully </span> |
@@ -198,8 +113,12 @@ The actual datasource level configs are listed below.
 | `write.partition.url_encode` | N | false | <span style="color:grey"> Whether 
to encode the partition path url, default false </span> |
 | `write.log.max.size` | N | 1024 | <span style="color:grey"> Maximum size 
allowed in MB for a log file before it is rolled over to the next version, 
default 1GB </span> |
 
+</div>
+
 If the table type is MERGE_ON_READ, you can also specify the asynchronous 
compaction strategy through options:
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `compaction.tasks` | N | 10 | <span style="color:grey"> Parallelism of tasks 
that do actual compaction, default is 10 </span> |
@@ -211,8 +130,12 @@ If the table type is MERGE_ON_READ, you can also specify 
the asynchronous compac
 | `clean.async.enabled` | N | true | <span style="color:grey"> Whether to 
cleanup the old commits immediately on new commits, enabled by default </span> |
 | `clean.retain_commits` | N | 10 | <span style="color:grey"> Number of 
commits to retain. So data will be retained for num_of_commits * 
time_between_commits (scheduled). This also directly translates into how much 
you can incrementally pull on this table, default 10 </span> |
 
+</div>
+
 Options about memory consumption:
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `write.rate.limit` | N | -1 | <span style="color:grey"> Write records rate 
limit per second to reduce risk of OOM, default -1 (no limit) </span> |
@@ -220,8 +143,12 @@ Options about memory consumption:
 | `write.log_block.size` | N | 128 | <span style="color:grey"> Max log block 
size in MB for log file, default 128MB </span> |
 | `compaction.max_memory` | N | 100 | <span style="color:grey"> Max memory in 
MB for compaction spillable map, default 100MB </span> |
 
+</div>
+
 ### Read Options
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `path` | Y | N/A | <span style="color:grey"> Base path for the target hoodie 
table. The path would be created if it does not exist, otherwise a hudi table 
expects to be initialized successfully </span> |
@@ -234,23 +161,35 @@ Options about memory consumption:
 | `hoodie.datasource.hive_style_partition` | N | false | <span 
style="color:grey"> Whether the partition path is with Hive style, e.g. 
'{partition key}={partition value}', default false </span> |
 | `read.utc-timezone` | N | true | <span style="color:grey"> Use UTC timezone 
or local timezone to the conversion between epoch time and LocalDateTime. Hive 
0.x/1.x/2.x use local timezone. But Hive 3.x use UTC timezone, by default true 
</span> |
 
+</div>
+
 Streaming read is supported through options:
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `read.streaming.enabled` | N | false | <span style="color:grey"> Whether to 
read as streaming source, default false </span> |
 | `read.streaming.check-interval` | N | 60 | <span style="color:grey"> Check 
interval for streaming read of SECOND, default 1 minute </span> |
 | `read.streaming.start-commit` | N | N/A | <span style="color:grey"> Start 
commit instant for streaming read, the commit time format should be 
'yyyyMMddHHmmss', by default reading from the latest instant </span> |
 
+</div>
+
 ### Index sync options
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `index.bootstrap.enabled` | N | false | <span style="color:grey"> Whether to 
bootstrap the index state from existing hoodie table, default false </span> |
 | `index.state.ttl` | N | 1.5 | <span style="color:grey"> Index state ttl in 
days, default 1.5 day </span> |
 
+</div>
+
 ### Hive sync options
 
+<div class="table-wrapper" markdown="block">
+
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `hive_sync.enable` | N | false | <span style="color:grey"> Asynchronously 
sync Hive meta to HMS, default false </span> |
@@ -269,6 +208,8 @@ Streaming read is supported through options:
 | `hive_sync.skip_ro_suffix` | N | false | <span style="color:grey"> Skip the 
_ro suffix for Read optimized table when registering, default false </span> |
 | `hive_sync.support_timestamp` | N | false | <span style="color:grey"> INT64 
with original type TIMESTAMP_MICROS is converted to hive timestamp type. 
Disabled by default for backward compatibility </span> |
 
+</div>
+
 ## WriteClient Configs {#writeclient-configs}
 
 Jobs programming directly against the RDD level apis can build a 
`HoodieWriteConfig` object and pass it in to the `HoodieWriteClient` 
constructor. 
@@ -288,78 +229,29 @@ HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
 
 Following subsections go over different aspects of write configs, explaining 
most important configs with their property names, default values.
 
-#### withPath(hoodie_base_path) {#withPath}
-Property: `hoodie.base.path` [Required] <br/>
-<span style="color:grey">Base DFS path under which all the data partitions are 
created. Always prefix it explicitly with the storage scheme (e.g hdfs://, 
s3:// etc). Hudi stores all the main meta-data about commits, savepoints, 
cleaning audit logs etc in .hoodie directory under the base directory. </span>
-
-#### withSchema(schema_str) {#withSchema} 
-Property: `hoodie.avro.schema` [Required]<br/>
-<span style="color:grey">This is the current reader avro schema for the table. 
This is a string of the entire schema. HoodieWriteClient uses this schema to 
pass on to implementations of HoodieRecordPayload to convert from the source 
format to avro record. This is also used when re-writing records during an 
update. </span>
-
-#### forTable(table_name) {#forTable} 
-Property: `hoodie.table.name` [Required] <br/>
- <span style="color:grey">Table name that will be used for registering with 
Hive. Needs to be same across runs.</span>
-
-#### withBulkInsertParallelism(bulk_insert_parallelism = 1500) 
{#withBulkInsertParallelism} 
-Property: `hoodie.bulkinsert.shuffle.parallelism`<br/>
-<span style="color:grey">Bulk insert is meant to be used for large initial 
imports and this parallelism determines the initial number of files in your 
table. Tune this to achieve a desired optimal size during initial import.</span>
-
-#### withUserDefinedBulkInsertPartitionerClass(className = 
x.y.z.UserDefinedPatitionerClass) {#withUserDefinedBulkInsertPartitionerClass} 
-Property: `hoodie.bulkinsert.user.defined.partitioner.class`<br/>
-<span style="color:grey">If specified, this class will be used to re-partition 
input records before they are inserted.</span>
-
-#### withBulkInsertSortMode(mode = BulkInsertSortMode.GLOBAL_SORT) 
{#withBulkInsertSortMode} 
-Property: `hoodie.bulkinsert.sort.mode`<br/>
-<span style="color:grey">Sorting modes to use for sorting records for bulk 
insert. This is leveraged when user defined partitioner is not configured. 
Default is GLOBAL_SORT. 
-   Available values are - **GLOBAL_SORT**:  this ensures best file sizes, with 
lowest memory overhead at cost of sorting. 
-  **PARTITION_SORT**: Strikes a balance by only sorting within a partition, 
still keeping the memory overhead of writing lowest and best effort file 
sizing. 
-  **NONE**: No sorting. Fastest and matches `spark.write.parquet()` in terms 
of number of files, overheads 
-</span>
-
-#### withParallelism(insert_shuffle_parallelism = 1500, 
upsert_shuffle_parallelism = 1500) {#withParallelism} 
-Property: `hoodie.insert.shuffle.parallelism`, 
`hoodie.upsert.shuffle.parallelism`<br/>
-<span style="color:grey">Once data has been initially imported, this 
parallelism controls initial parallelism for reading input records. Ensure this 
value is high enough say: 1 partition for 1 GB of input data</span>
-
-#### withDeleteParallelism(parallelism = 1500) {#withDelteParallelism}
-Property: `hoodie.delete.shuffle.parallelism`<br/>
-<span style="color:grey">This parallelism is Used for "delete" operation while 
deduping or repartioning. </span>
-
-#### combineInput(on_insert = false, on_update=true) {#combineInput} 
-Property: `hoodie.combine.before.insert`, `hoodie.combine.before.upsert`<br/>
-<span style="color:grey">Flag which first combines the input RDD and merges 
multiple partial records into a single record before inserting or updating in 
DFS</span>
-
-#### combineDeleteInput(on_Delete = true) {#combineDeleteInput}
-Property: `hoodie.combine.before.delete`<br/>
-<span style="color:grey">Flag which first combines the input RDD and merges 
multiple partial records into a single record before deleting in DFS</span>
-
-#### withMergeAllowDuplicateOnInserts(mergeAllowDuplicateOnInserts = false) 
{#withMergeAllowDuplicateOnInserts}
-Property: `hoodie.merge.allow.duplicate.on.inserts` <br/>
-<span style="color:grey"> When enabled, will route new records as inserts and 
will not merge with existing records. 
-Result could contain duplicate entries. </span>
-
-#### withWriteStatusStorageLevel(level = MEMORY_AND_DISK_SER) 
{#withWriteStatusStorageLevel} 
-Property: `hoodie.write.status.storage.level`<br/>
-<span style="color:grey">HoodieWriteClient.insert and HoodieWriteClient.upsert 
returns a persisted RDD[WriteStatus], this is because the Client can choose to 
inspect the WriteStatus and choose and commit or not based on the failures. 
This is a configuration for the storage level for this RDD </span>
-
-#### withAutoCommit(autoCommit = true) {#withAutoCommit} 
-Property: `hoodie.auto.commit`<br/>
-<span style="color:grey">Should HoodieWriteClient autoCommit after insert and 
upsert. The client can choose to turn off auto-commit and commit on a "defined 
success condition"</span>
-
-#### withConsistencyCheckEnabled(enabled = false) 
{#withConsistencyCheckEnabled} 
-Property: `hoodie.consistency.check.enabled`<br/>
-<span style="color:grey">Should HoodieWriteClient perform additional checks to 
ensure written files' are listable on the underlying filesystem/storage. Set 
this to true, to workaround S3's eventual consistency model and ensure all data 
written as a part of a commit is faithfully available for queries. </span>
-
-#### withRollbackParallelism(rollbackParallelism = 100) 
{#withRollbackParallelism} 
-Property: `hoodie.rollback.parallelism`<br/>
-<span style="color:grey">Determines the parallelism for rollback of 
commits.</span>
-
-#### withRollbackUsingMarkers(rollbackUsingMarkers = false) 
{#withRollbackUsingMarkers} 
-Property: `hoodie.rollback.using.markers`<br/>
-<span style="color:grey">Enables a more efficient mechanism for rollbacks 
based on the marker files generated during the writes. Turned off by 
default.</span>
-
-#### withMarkersDeleteParallelism(parallelism = 100) 
{#withMarkersDeleteParallelism} 
-Property: `hoodie.markers.delete.parallelism`<br/>
-<span style="color:grey">Determines the parallelism for deleting marker 
files.</span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withPath(hoodie_base_path) | hoodie.base.path | YES | N/A | Base DFS path 
under which all the data partitions are created. Always prefix it explicitly 
with the storage scheme (e.g hdfs://, s3:// etc). Hudi stores all the main 
meta-data about commits, savepoints, cleaning audit logs etc in .hoodie 
directory under the base directory. |
+| withSchema(schema_str) | hoodie.avro.schema | YES | N/A | This is the 
current reader avro schema for the table. This is a string of the entire 
schema. HoodieWriteClient uses this schema to pass on to implementations of 
HoodieRecordPayload to convert from the source format to avro record. This is 
also used when re-writing records during an update. |
+| forTable(table_name) | hoodie.table.name | YES | N/A | Table name that will 
be used for registering with Hive. Needs to be same across runs. |
+| withBulkInsertParallelism(bulk_insert_parallelism) | 
hoodie.bulkinsert.shuffle.parallelism | NO | 1500 | Bulk insert is meant to be 
used for large initial imports and this parallelism determines the initial 
number of files in your table. Tune this to achieve a desired optimal size 
during initial import. |
+| withUserDefinedBulkInsertPartitionerClass(className) | 
hoodie.bulkinsert.user.defined.partitioner.class | NO | Pattern like 
x.y.z.UserDefinedPatitionerClass | If specified, this class will be used to 
re-partition input records before they are inserted. |
+| withBulkInsertSortMode(mode) | hoodie.bulkinsert.sort.mode | NO | 
BulkInsertSortMode.GLOBAL_SORT | Sorting modes to use for sorting records for 
bulk insert. This is leveraged when user defined partitioner is not configured. 
Default is GLOBAL_SORT. Available values are - GLOBAL_SORT: this ensures best 
file sizes, with lowest memory overhead at cost of sorting. PARTITION_SORT: 
Strikes a balance by only sorting within a partition, still keeping the memory 
overhead of writing lowest and be [...]
+| withParallelism(insert_shuffle_parallelism, upsert_shuffle_parallelism) | 
hoodie.insert.shuffle.parallelism, hoodie.upsert.shuffle.parallelism | NO | 
insert_shuffle_parallelism = 1500, upsert_shuffle_parallelism = 1500 | Once 
data has been initially imported, this parallelism controls initial parallelism 
for reading input records. Ensure this value is high enough say: 1 partition 
for 1 GB of input data. |
+| withDeleteParallelism(parallelism) | hoodie.delete.shuffle.parallelism | NO 
| 1500 | This parallelism is Used for “delete” operation while deduping or 
repartioning. |
+| combineInput(on_insert, on_update) | hoodie.combine.before.insert, 
hoodie.combine.before.upsert | NO | on_insert = false, on_update=true | Flag 
which first combines the input RDD and merges multiple partial records into a 
single record before inserting or updating in DFS. |
+| combineDeleteInput(on_Delete) | hoodie.combine.before.delete | NO | true | 
Flag which first combines the input RDD and merges multiple partial records 
into a single record before deleting in DFS. |
+| withMergeAllowDuplicateOnInserts(mergeAllowDuplicateOnInserts） | 
hoodie.merge.allow.duplicate.on.inserts | NO | false | When enabled, will route 
new records as inserts and will not merge with existing records. Result could 
contain duplicate entries. |
+| withWriteStatusStorageLevel(level） | hoodie.write.status.storage.level | NO 
| MEMORY_AND_DISK_SER | HoodieWriteClient.insert and HoodieWriteClient.upsert 
returns a persisted RDD[WriteStatus], this is because the Client can choose to 
inspect the WriteStatus and choose and commit or not based on the failures. 
This is a configuration for the storage level for this RDD. |
+| withAutoCommit(autoCommit） | hoodie.auto.commit | NO | true | Should 
HoodieWriteClient autoCommit after insert and upsert. The client can choose to 
turn off auto-commit and commit on a “defined success condition”. |
+| withConsistencyCheckEnabled(enabled） | hoodie.consistency.check.enabled | NO 
| false | Should HoodieWriteClient perform additional checks to ensure written 
files' are listable on the underlying filesystem/storage. Set this to true, to 
workaround S3's eventual consistency model and ensure all data written as a 
part of a commit is faithfully available for queries. |
+| withRollbackParallelism(rollbackParallelism） | hoodie.rollback.parallelism | 
NO | 100 | Determine the parallelism for rollback of commits. |
+| withRollbackUsingMarkers(rollbackUsingMarkers） | 
hoodie.rollback.using.markers | NO | false | Enables a more efficient mechanism 
for rollbacks based on the marker files generated during the writes. Turned off 
by default. |
+| withMarkersDeleteParallelism(parallelism） | 
hoodie.markers.delete.parallelism | NO | 100 | Determines the parallelism for 
deleting marker files. |
+
+</div>
 
 ### Index configs
 Following configs control indexing behavior, which tags incoming records as 
either inserts or updates to older records. 
@@ -367,346 +259,185 @@ Following configs control indexing behavior, which tags 
incoming records as eith
 [withIndexConfig](#index-configs) (HoodieIndexConfig) <br/>
 <span style="color:grey">This is pluggable to have a external index (HBase) or 
use the default bloom filter stored in the Parquet files</span>
 
-#### withIndexClass(indexClass = "x.y.z.UserDefinedIndex") {#withIndexClass}
-Property: `hoodie.index.class` <br/>
-<span style="color:grey">Full path of user-defined index class and must be a 
subclass of HoodieIndex class. It will take precedence over the 
`hoodie.index.type` configuration if specified</span>
-
-#### withIndexType(indexType = BLOOM) {#withIndexType}
-Property: `hoodie.index.type` <br/>
-<span style="color:grey">Type of index to use. Default is Bloom filter. 
Possible options are [BLOOM | GLOBAL_BLOOM |SIMPLE | GLOBAL_SIMPLE | INMEMORY | 
HBASE]. Bloom filters removes the dependency on a external system and is stored 
in the footer of the Parquet Data Files</span>
-
-#### Bloom Index configs
-
-#### bloomIndexFilterType(bucketizedChecking = BloomFilterTypeCode.SIMPLE) 
{#bloomIndexFilterType}
-Property: `hoodie.bloom.index.filter.type` <br/>
-<span style="color:grey">Filter type used. Default is 
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE , 
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves 
based on number of keys.</span>
-
-#### bloomFilterNumEntries(numEntries = 60000) {#bloomFilterNumEntries}
-Property: `hoodie.index.bloom.num_entries` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/>This is the 
number of entries to be stored in the bloom filter. We assume the 
maxParquetFileSize is 128MB and averageRecordSize is 1024B and hence we approx 
a total of 130K records in a file. The default (60000) is roughly half of this 
approximation. [HUDI-56](https://issues.apache.org/jira/browse/HUDI-56) tracks 
computing this dynamically. Warning: Setting this very low, will generate a lot 
of false positives and index l [...]
-
-#### bloomFilterFPP(fpp = 0.000000001) {#bloomFilterFPP}
-Property: `hoodie.index.bloom.fpp` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> Error rate 
allowed given the number of entries. This is used to calculate how many bits 
should be assigned for the bloom filter and the number of hash functions. This 
is usually set very low (default: 0.000000001), we like to tradeoff disk space 
for lower false positives. If the number of entries added to bloom filter 
exceeds the congfigured value (`hoodie.index.bloom.num_entries`), then this fpp 
may not be honored.</span>
+<div class="table-wrapper" markdown="block">
 
-#### bloomIndexParallelism(0) {#bloomIndexParallelism}
-Property: `hoodie.bloom.index.parallelism` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> This is 
the amount of parallelism for index lookup, which involves a Spark Shuffle. By 
default, this is auto computed based on input workload characteristics</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withIndexClass(indexClass) | hoodie.index.class | NO | Index class path, 
like x.y.z.UserDefinedIndex | Full path of user-defined index class and must be 
a subclass of HoodieIndex class. It will take precedence over the 
hoodie.index.type configuration if specified. |
+| withIndexType(indexType) | hoodie.index.type | NO | BLOOM | Type of index to 
use. Default is Bloom filter. Possible options are [BLOOM, GLOBAL_BLOOM, 
SIMPLE, GLOBAL_SIMPLE, INMEMORY, HBASE]. Bloom filters removes the dependency 
on a external system and is stored in the footer of the Parquet Data Files. |
 
-#### bloomIndexPruneByRanges(pruneRanges = true) {#bloomIndexPruneByRanges}
-Property: `hoodie.bloom.index.prune.by.ranges` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> When true, 
range information from files to leveraged speed up index lookups. Particularly 
helpful, if the key has a monotonously increasing prefix, such as timestamp. If 
the record key is completely random, it is better to turn this off.</span>
+</div>
 
-#### bloomIndexUseCaching(useCaching = true) {#bloomIndexUseCaching}
-Property: `hoodie.bloom.index.use.caching` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> When true, 
the input RDD will cached to speed up index lookup by reducing IO for computing 
parallelism or affected partitions</span>
-
-#### bloomIndexTreebasedFilter(useTreeFilter = true) 
{#bloomIndexTreebasedFilter}
-Property: `hoodie.bloom.index.use.treebased.filter` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> When true, 
interval tree based file pruning optimization is enabled. This mode speeds-up 
file-pruning based on key ranges when compared with the brute-force mode</span>
-
-#### bloomIndexBucketizedChecking(bucketizedChecking = true) 
{#bloomIndexBucketizedChecking}
-Property: `hoodie.bloom.index.bucketized.checking` <br/>
-<span style="color:grey">Only applies if index type is BLOOM. <br/> When true, 
bucketized bloom filtering is enabled. This reduces skew seen in sort based 
bloom index lookup</span>
-
-#### bloomIndexFilterDynamicMaxEntries(maxNumberOfEntries = 100000) 
{#bloomIndexFilterDynamicMaxEntries}
-Property: `hoodie.bloom.index.filter.dynamic.max.entries` <br/>
-<span style="color:grey">The threshold for the maximum number of keys to 
record in a dynamic Bloom filter row. Only applies if filter type is 
BloomFilterTypeCode.DYNAMIC_V0.</span>
-
-#### bloomIndexKeysPerBucket(keysPerBucket = 10000000) 
{#bloomIndexKeysPerBucket}
-Property: `hoodie.bloom.index.keys.per.bucket` <br/>
-<span style="color:grey">Only applies if bloomIndexBucketizedChecking is 
enabled and index type is bloom. <br/> This configuration controls the "bucket" 
size which tracks the number of record-key checks made against a single file 
and is the unit of work allocated to each partition performing bloom filter 
lookup. A higher value would amortize the fixed cost of reading a bloom filter 
to memory. </span>
-
-##### withBloomIndexInputStorageLevel(level = MEMORY_AND_DISK_SER) 
{#withBloomIndexInputStorageLevel}
-Property: `hoodie.bloom.index.input.storage.level` <br/>
-<span style="color:grey">Only applies when 
[#bloomIndexUseCaching](#bloomIndexUseCaching) is set. Determine what level of 
persistence is used to cache input RDDs.<br/> Refer to 
org.apache.spark.storage.StorageLevel for different values</span>
+#### Bloom Index configs
 
-##### bloomIndexUpdatePartitionPath(updatePartitionPath = false) 
{#bloomIndexUpdatePartitionPath}
-Property: `hoodie.bloom.index.update.partition.path` <br/>
-<span style="color:grey">Only applies if index type is GLOBAL_BLOOM. <br/>When 
set to true, an update including the partition path of a record that already 
exists will result in inserting the incoming record into the new partition and 
deleting the original record in the old partition. When set to false, the 
original record will only be updated in the old partition.</span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| bloomIndexFilterType(bucketizedChecking) | hoodie.bloom.index.filter.type | 
NO | BloomFilterTypeCode.SIMPLE | Filter type used. Default is 
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE , 
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves 
based on number of keys. |
+| bloomFilterNumEntries(numEntries) | hoodie.index.bloom.num_entries | NO | 
60000 | Only applies if index type is BLOOM. <br/>This is the number of entries 
to be stored in the bloom filter. We assume the maxParquetFileSize is 128MB and 
averageRecordSize is 1024B and hence we approx a total of 130K records in a 
file. The default (60000) is roughly half of this approximation. 
[HUDI-56](https://issues.apache.org/jira/browse/HUDI-56) tracks computing this 
dynamically. Warning: Setting this v [...]
+| bloomFilterFPP(fpp) | hoodie.index.bloom.fpp | NO | 0.000000001 | Only 
applies if index type is BLOOM.Error rate allowed given the number of entries. 
This is used to calculate how many bits should be assigned for the bloom filter 
and the number of hash functions. This is usually set very low (default: 
0.000000001), we like to tradeoff disk space for lower false positives. If the 
number of entries added to bloom filter exceeds the congfigured value 
(hoodie.index.bloom.num_entries), then [...]
+| bloomIndexParallelism(parallelism) | hoodie.bloom.index.parallelism | NO | 0 
| Only applies if index type is BLOOM. This is the amount of parallelism for 
index lookup, which involves a Spark Shuffle. By default, this is auto computed 
based on input workload characteristics. |
+| bloomIndexPruneByRanges(pruneRanges) | hoodie.bloom.index.prune.by.ranges | 
NO | true | Only applies if index type is BLOOM. When true, range information 
from files to leveraged speed up index lookups. Particularly helpful, if the 
key has a monotonously increasing prefix, such as timestamp. If the record key 
is completely random, it is better to turn this off. |
+| bloomIndexUseCaching(useCaching) | hoodie.bloom.index.use.caching | NO | 
true | Only applies if index type is BLOOM. When true, the input RDD will 
cached to speed up index lookup by reducing IO for computing parallelism or 
affected partitions. |
+| bloomIndexTreebasedFilter(useTreeFilter) | 
hoodie.bloom.index.use.treebased.filter | NO | true | When true, interval tree 
based file pruning optimization is enabled. This mode speeds-up file-pruning 
based on key ranges when compared with the brute-force mode. |
+| bloomIndexBucketizedChecking(bucketizedChecking) | 
hoodie.bloom.index.bucketized.checking | NO | true | When true, bucketized 
bloom filtering is enabled. This reduces skew seen in sort based bloom index 
lookup. |
+| bloomIndexFilterDynamicMaxEntries(maxNumberOfEntries) | 
hoodie.bloom.index.filter.dynamic.max.entries | NO | 100000 | The threshold for 
the maximum number of keys to record in a dynamic Bloom filter row. Only 
applies if filter type is BloomFilterTypeCode.DYNAMIC_V0. |
+| bloomIndexKeysPerBucket(keysPerBucket) | hoodie.bloom.index.keys.per.bucket 
| NO | 10000000 | Only applies if bloomIndexBucketizedChecking is enabled and 
index type is bloom. This configuration controls the “bucket” size which tracks 
the number of record-key checks made against a single file and is the unit of 
work allocated to each partition performing bloom filter lookup. A higher value 
would amortize the fixed cost of reading a bloom filter to memory. |
+| withBloomIndexInputStorageLevel(level) | 
hoodie.bloom.index.input.storage.level | NO | MEMORY_AND_DISK_SER | Only 
applies when bloomIndexUseCaching is set. Determine what level of persistence 
is used to cache input RDDs. Refer to org.apache.spark.storage.StorageLevel for 
different values. |
+| bloomIndexUpdatePartitionPath(updatePartitionPath) | 
hoodie.bloom.index.update.partition.path | NO | false | Only applies if index 
type is GLOBAL_BLOOM. When set to true, an update including the partition path 
of a record that already exists will result in inserting the incoming record 
into the new partition and deleting the original record in the old partition. 
When set to false, the original record will only be updated in the old 
partition. |
+
+</div>
 
 #### HBase Index configs
 
-#### hbaseZkQuorum(zkString) [Required] {#hbaseZkQuorum}  
-Property: `hoodie.index.hbase.zkquorum` <br/>
-<span style="color:grey">Only applies if index type is HBASE. HBase ZK Quorum 
url to connect to.</span>
-
-#### hbaseZkPort(port) [Required] {#hbaseZkPort}  
-Property: `hoodie.index.hbase.zkport` <br/>
-<span style="color:grey">Only applies if index type is HBASE. HBase ZK Quorum 
port to connect to.</span>
+<div class="table-wrapper" markdown="block">
 
-#### hbaseZkZnodeParent(zkZnodeParent)  [Required] {#hbaseTableName}
-Property: `hoodie.index.hbase.zknode.path` <br/>
-<span style="color:grey">Only applies if index type is HBASE. This is the root 
znode that will contain all the znodes created/used by HBase.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| hbaseZkQuorum(zkString) | hoodie.index.hbase.zkquorum | YES | N/A | Only 
applies if index type is HBASE. HBase ZK Quorum url to connect to. |
+| hbaseZkPort(port) | hoodie.index.hbase.zkport | YES | N/A | Only applies if 
index type is HBASE. HBase ZK Quorum port to connect to. |
+| hbaseZkZnodeParent(zkZnodeParent) | hoodie.index.hbase.zknode.path | YES | 
N/A | Only applies if index type is HBASE. This is the root znode that will 
contain all the znodes created/used by HBase. |
+| hbaseTableName(tableName) | hoodie.index.hbase.table | YES | N/A | Only 
applies if index type is HBASE. HBase Table name to use as the index. Hudi 
stores the row_key and [partition_path, fileID, commitTime] mapping in the 
table. |
+| hbaseIndexUpdatePartitionPath(updatePartitionPath) | 
hoodie.hbase.index.update.partition.path | NO | false | Only applies if index 
type is HBASE. When an already existing record is upserted to a new partition 
compared to whats in storage, this config when set true, will delete old record 
in old paritition and will insert it as new record in new partition. |
 
-#### hbaseTableName(tableName)  [Required] {#hbaseTableName}
-Property: `hoodie.index.hbase.table` <br/>
-<span style="color:grey">Only applies if index type is HBASE. HBase Table name 
to use as the index. Hudi stores the row_key and [partition_path, fileID, 
commitTime] mapping in the table.</span>
-
-#### hbaseIndexUpdatePartitionPath(updatePartitionPath) 
{#hbaseIndexUpdatePartitionPath}
-Property: `hoodie.hbase.index.update.partition.path` <br/>
-<span style="color:grey">Only applies if index type is HBASE. When an already 
existing record is upserted to a new partition compared to whats in storage, 
this config when set, will delete old record in old paritition and will insert 
it as new record in new partition. </span>
+</div>
 
 #### Simple Index configs
 
-#### simpleIndexUseCaching(useCaching = true) {#simpleIndexUseCaching}
-Property: `hoodie.simple.index.use.caching` <br/>
-<span style="color:grey">Only applies if index type is SIMPLE. <br/> When 
true, the input RDD will cached to speed up index lookup by reducing IO for 
computing parallelism or affected partitions</span>
-
-##### withSimpleIndexInputStorageLevel(level = MEMORY_AND_DISK_SER) 
{#withSimpleIndexInputStorageLevel}
-Property: `hoodie.simple.index.input.storage.level` <br/>
-<span style="color:grey">Only applies when 
[#simpleIndexUseCaching](#simpleIndexUseCaching) is set. Determine what level 
of persistence is used to cache input RDDs.<br/> Refer to 
org.apache.spark.storage.StorageLevel for different values</span>
+<div class="table-wrapper" markdown="block">
 
-#### withSimpleIndexParallelism(parallelism = 50) {#withSimpleIndexParallelism}
-Property: `hoodie.simple.index.parallelism` <br/>
-<span style="color:grey">Only applies if index type is SIMPLE. <br/> This is 
the amount of parallelism for index lookup, which involves a Spark 
Shuffle.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| simpleIndexUseCaching(useCaching) | hoodie.simple.index.use.caching | NO | 
true | Only applies if index type is SIMPLE. When true, the input RDD will 
cached to speed up index lookup by reducing IO for computing parallelism or 
affected partitions. |
+| withSimpleIndexInputStorageLevel(level) | 
hoodie.simple.index.input.storage.level | NO | true | Only applies when 
simpleIndexUseCaching is set. Determine what level of persistence is used to 
cache input RDDs. Refer to org.apache.spark.storage.StorageLevel for different 
values. |
+| withSimpleIndexParallelism(parallelism) | hoodie.simple.index.parallelism | 
NO | 50 | Only applies if index type is SIMPLE. This is the amount of 
parallelism for index lookup, which involves a Spark Shuffle. |
+| withGlobalSimpleIndexParallelism(parallelism) | 
hoodie.global.simple.index.parallelism | NO | 100 | Only applies if index type 
is GLOBAL_SIMPLE. This is the amount of parallelism for index lookup, which 
involves a Spark Shuffle. |
 
-#### withGlobalSimpleIndexParallelism(parallelism = 100) 
{#withGlobalSimpleIndexParallelism}
-Property: `hoodie.global.simple.index.parallelism` <br/>
-<span style="color:grey">Only applies if index type is GLOBAL_SIMPLE. <br/> 
This is the amount of parallelism for index lookup, which involves a Spark 
Shuffle.</span>
+</div>
 
 ### Storage configs
 Controls aspects around sizing parquet and log files.
 
 [withStorageConfig](#withStorageConfig) (HoodieStorageConfig) <br/>
 
-#### limitFileSize (size = 120MB) {#limitFileSize}
-Property: `hoodie.parquet.max.file.size` <br/>
-<span style="color:grey">Target size for parquet files produced by Hudi write 
phases. For DFS, this needs to be aligned with the underlying filesystem block 
size for optimal performance. </span>
-
-#### parquetBlockSize(rowgroupsize = 120MB) {#parquetBlockSize} 
-Property: `hoodie.parquet.block.size` <br/>
-<span style="color:grey">Parquet RowGroup size. Its better this is same as the 
file size, so that a single column within a file is stored continuously on 
disk</span>
-
-#### parquetPageSize(pagesize = 1MB) {#parquetPageSize} 
-Property: `hoodie.parquet.page.size` <br/>
-<span style="color:grey">Parquet page size. Page is the unit of read within a 
parquet file. Within a block, pages are compressed seperately. </span>
+<div class="table-wrapper" markdown="block">
 
-#### parquetCompressionRatio(parquetCompressionRatio = 0.1) 
{#parquetCompressionRatio} 
-Property: `hoodie.parquet.compression.ratio` <br/>
-<span style="color:grey">Expected compression of parquet data used by Hudi, 
when it tries to size new parquet files. Increase this value, if bulk_insert is 
producing smaller than expected sized files</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| limitFileSize(size) | hoodie.parquet.max.file.size | NO | 125829120(120MB) | 
Target size for parquet files produced by Hudi write phases. For DFS, this 
needs to be aligned with the underlying filesystem block size for optimal 
performance. |
+| parquetBlockSize(rowgroupsize) | hoodie.parquet.block.size | NO | 
125829120(120MB) | Parquet RowGroup size. Its better this is same as the file 
size, so that a single column within a file is stored continuously on disk. |
+| parquetPageSize(pagesize) | hoodie.parquet.page.size | NO | 1048576(1MB) | 
Parquet page size. Page is the unit of read within a parquet file. Within a 
block, pages are compressed seperately. |
+| parquetCompressionRatio(parquetCompressionRatio) | 
hoodie.parquet.compression.ratio | NO | 0.1 | Expected compression of parquet 
data used by Hudi, when it tries to size new parquet files. Increase this 
value, if bulk_insert is producing smaller than expected sized files. |
+| parquetCompressionCodec(parquetCompressionCodec) | 
hoodie.parquet.compression.codec | NO | gzip | Parquet compression codec name. 
Default is gzip. Possible options are [gzip, snappy, uncompressed, lzo]. |
+| logFileMaxSize(logFileSize) | hoodie.logfile.max.size | NO | 1073741824(1GB) 
| LogFile max size. This is the maximum size allowed for a log file before it 
is rolled over to the next version. |
+| logFileDataBlockMaxSize(dataBlockSize) | hoodie.logfile.data.block.max.size 
| NO | 268435456(256MB) | LogFile Data block max size. This is the maximum size 
allowed for a single data block to be appended to a log file. This helps to 
make sure the data appended to the log file is broken up into sizable blocks to 
prevent from OOM errors. This size should be greater than the JVM memory. |
+| logFileToParquetCompressionRatio(logFileToParquetCompressionRatio) | 
hoodie.logfile.to.parquet.compression.ratio | NO | 0.35 | Expected additional 
compression as records move from log files to parquet. Used for merge_on_read 
table to send inserts into log files & control the size of compacted parquet 
file. |
 
-#### parquetCompressionCodec(parquetCompressionCodec = gzip) 
{#parquetCompressionCodec}
-Property: `hoodie.parquet.compression.codec` <br/>
-<span style="color:grey">Parquet compression codec name. Default is gzip. 
Possible options are [gzip | snappy | uncompressed | lzo]</span>
-
-#### logFileMaxSize(logFileSize = 1GB) {#logFileMaxSize} 
-Property: `hoodie.logfile.max.size` <br/>
-<span style="color:grey">LogFile max size. This is the maximum size allowed 
for a log file before it is rolled over to the next version. </span>
-
-#### logFileDataBlockMaxSize(dataBlockSize = 256MB) {#logFileDataBlockMaxSize} 
-Property: `hoodie.logfile.data.block.max.size` <br/>
-<span style="color:grey">LogFile Data block max size. This is the maximum size 
allowed for a single data block to be appended to a log file. This helps to 
make sure the data appended to the log file is broken up into sizable blocks to 
prevent from OOM errors. This size should be greater than the JVM memory. 
</span>
-
-#### logFileToParquetCompressionRatio(logFileToParquetCompressionRatio = 0.35) 
{#logFileToParquetCompressionRatio} 
-Property: `hoodie.logfile.to.parquet.compression.ratio` <br/>
-<span style="color:grey">Expected additional compression as records move from 
log files to parquet. Used for merge_on_read table to send inserts into log 
files & control the size of compacted parquet file.</span>
- 
-#### parquetCompressionCodec(parquetCompressionCodec = gzip) 
{#parquetCompressionCodec} 
-Property: `hoodie.parquet.compression.codec` <br/>
-<span style="color:grey">Compression Codec for parquet files </span>
+</div>
 
 ### Compaction configs
 Configs that control compaction (merging of log files onto a new parquet base 
file), cleaning (reclamation of older/unused file groups).
 [withCompactionConfig](#withCompactionConfig) (HoodieCompactionConfig) <br/>
 
-#### withCleanerPolicy(policy = KEEP_LATEST_COMMITS) {#withCleanerPolicy} 
-Property: `hoodie.cleaner.policy` <br/>
-<span style="color:grey"> Cleaning policy to be used. Hudi will delete older 
versions of parquet files to re-claim space. Any Query/Computation referring to 
this version of the file will fail. It is good to make sure that the data is 
retained for more than the maximum query execution time.</span>
-
-#### withFailedWritesCleaningPolicy(policy = 
HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy}
-Property: `hoodie.cleaner.policy.failed.writes` <br/>
-<span style="color:grey"> Cleaning policy for failed writes to be used. Hudi 
will delete any files written by failed writes to re-claim space. Choose to 
perform this rollback of failed writes `eagerly` before every writer starts 
(only supported for single writer) or `lazily` by the cleaner (required for 
multi-writers)</span>
-
-#### retainCommits(no_of_commits_to_retain = 24) {#retainCommits} 
-Property: `hoodie.cleaner.commits.retained` <br/>
-<span style="color:grey">Number of commits to retain. So data will be retained 
for num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much you can incrementally pull on this table</span>
-
-#### withAutoClean(autoClean = true) {#withAutoClean} 
-Property: `hoodie.clean.automatic` <br/>
-<span style="color:grey">Should cleanup if there is anything to cleanup 
immediately after the commit</span>
-
-#### withAsyncClean(asyncClean = false) {#withAsyncClean} 
-Property: `hoodie.clean.async` <br/>
-<span style="color:grey">Only applies when [#withAutoClean](#withAutoClean) is 
turned on. When turned on runs cleaner async with writing. </span>
-
-#### archiveCommitsWith(minCommits = 96, maxCommits = 128) 
{#archiveCommitsWith} 
-Property: `hoodie.keep.min.commits`, `hoodie.keep.max.commits` <br/>
-<span style="color:grey">Each commit is a small file in the `.hoodie` 
directory. Since DFS typically does not favor lots of small files, Hudi 
archives older commits into a sequential log. A commit is published atomically 
by a rename of the commit file.</span>
-
-#### withCommitsArchivalBatchSize(batch = 10) {#withCommitsArchivalBatchSize}
-Property: `hoodie.commits.archival.batch` <br/>
-<span style="color:grey">This controls the number of commit instants read in 
memory as a batch and archived together.</span>
-
-#### compactionSmallFileSize(size = 100MB) {#compactionSmallFileSize} 
-Property: `hoodie.parquet.small.file.limit` <br/>
-<span style="color:grey">This should be less < maxFileSize and setting it to 
0, turns off this feature. Small files can always happen because of the number 
of insert records in a partition in a batch. Hudi has an option to auto-resolve 
small files by masking inserts into this partition as updates to existing small 
files. The size here is the minimum file size considered as a "small file 
size".</span>
-
-#### insertSplitSize(size = 500000) {#insertSplitSize} 
-Property: `hoodie.copyonwrite.insert.split.size` <br/>
-<span style="color:grey">Insert Write Parallelism. Number of inserts grouped 
for a single partition. Writing out 100MB files, with atleast 1kb records, 
means 100K records per file. Default is to overprovision to 500K. To improve 
insert latency, tune this to match the number of records in a single file. 
Setting this to a low number, will result in small files (particularly when 
compactionSmallFileSize is 0)</span>
-
-#### autoTuneInsertSplits(true) {#autoTuneInsertSplits} 
-Property: `hoodie.copyonwrite.insert.auto.split` <br/>
-<span style="color:grey">Should hudi dynamically compute the insertSplitSize 
based on the last 24 commit's metadata. Turned on by default. </span>
-
-#### approxRecordSize(size = 1024) {#approxRecordSize} 
-Property: `hoodie.copyonwrite.record.size.estimate` <br/>
-<span style="color:grey">The average record size. If specified, hudi will use 
this and not compute dynamically based on the last 24 commit's metadata. No 
value set as default. This is critical in computing the insert parallelism and 
bin-packing inserts into small files. See above.</span>
-
-#### withInlineCompaction(inlineCompaction = false) {#withInlineCompaction} 
-Property: `hoodie.compact.inline` <br/>
-<span style="color:grey">When set to true, compaction is triggered by the 
ingestion itself, right after a commit/deltacommit action as part of 
insert/upsert/bulk_insert</span>
-
-#### withMaxNumDeltaCommitsBeforeCompaction(maxNumDeltaCommitsBeforeCompaction 
= 10) {#withMaxNumDeltaCommitsBeforeCompaction} 
-Property: `hoodie.compact.inline.max.delta.commits` <br/>
-<span style="color:grey">Number of max delta commits to keep before triggering 
an inline compaction</span>
-
-#### withCompactionLazyBlockReadEnabled(true) 
{#withCompactionLazyBlockReadEnabled} 
-Property: `hoodie.compaction.lazy.block.read` <br/>
-<span style="color:grey">When a CompactedLogScanner merges all log files, this 
config helps to choose whether the logblocks should be read lazily or not. 
Choose true to use I/O intensive lazy block reading (low memory usage) or false 
for Memory intensive immediate block read (high memory usage)</span>
-
-#### withCompactionReverseLogReadEnabled(false) 
{#withCompactionReverseLogReadEnabled} 
-Property: `hoodie.compaction.reverse.log.read` <br/>
-<span style="color:grey">HoodieLogFormatReader reads a logfile in the forward 
direction starting from pos=0 to pos=file_length. If this config is set to 
true, the Reader reads the logfile in reverse direction, from pos=file_length 
to pos=0</span>
-
-#### withCleanerParallelism(cleanerParallelism = 200) 
{#withCleanerParallelism} 
-Property: `hoodie.cleaner.parallelism` <br/>
-<span style="color:grey">Increase this if cleaning becomes slow.</span>
-
-#### withCompactionStrategy(compactionStrategy = 
org.apache.hudi.io.compact.strategy.LogFileSizeBasedCompactionStrategy) 
{#withCompactionStrategy} 
-Property: `hoodie.compaction.strategy` <br/>
-<span style="color:grey">Compaction strategy decides which file groups are 
picked up for compaction during each compaction run. By default. Hudi picks the 
log file with most accumulated unmerged data</span>
-
-#### withTargetIOPerCompactionInMB(targetIOPerCompactionInMB = 500000) 
{#withTargetIOPerCompactionInMB} 
-Property: `hoodie.compaction.target.io` <br/>
-<span style="color:grey">Amount of MBs to spend during compaction run for the 
LogFileSizeBasedCompactionStrategy. This value helps bound ingestion latency 
while compaction is run inline mode.</span>
-
-#### withTargetPartitionsPerDayBasedCompaction(targetPartitionsPerCompaction = 
10) {#withTargetPartitionsPerDayBasedCompaction} 
-Property: `hoodie.compaction.daybased.target` <br/>
-<span style="color:grey">Used by 
org.apache.hudi.io.compact.strategy.DayBasedCompactionStrategy to denote the 
number of latest partitions to compact during a compaction run.</span>    
-
-#### withPayloadClass(payloadClassName = 
org.apache.hudi.common.model.HoodieAvroPayload) {#payloadClassName} 
-Property: `hoodie.compaction.payload.class` <br/>
-<span style="color:grey">This needs to be same as class used during 
insert/upserts. Just like writing, compaction also uses the record payload 
class to merge records in the log against each other, merge again with the base 
file and produce the final record to be written after compaction.</span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withCleanerPolicy(policy) | hoodie.cleaner.policy | NO | KEEP_LATEST_COMMITS 
| Cleaning policy to be used. Hudi will delete older versions of parquet files 
to re-claim space. Any Query/Computation referring to this version of the file 
will fail. It is good to make sure that the data is retained for more than the 
maximum query execution time. |
+| withFailedWritesCleaningPolicy(policy) | hoodie.cleaner.policy.failed.writes 
| NO | HoodieFailedWritesCleaningPolicy.EAGER | Cleaning policy for failed 
writes to be used. Hudi will delete any files written by failed writes to 
re-claim space. Choose to perform this rollback of failed writes eagerly before 
every writer starts (only supported for single writer) or lazily by the cleaner 
(required for multi-writers) |
+| retainCommits(num_of_commits_to_retain) | hoodie.cleaner.commits.retained | 
NO | 24 | Cleaning policy for failed writes to be used. Hudi will delete any 
files written by failed writes to re-claim space. Choose to perform this 
rollback of failed writes eagerly before every writer starts (only supported 
for single writer) or lazily by the cleaner (required for multi-writers) |
+| withAutoClean(autoClean) | hoodie.clean.automatic | NO | true | Should 
cleanup if there is anything to cleanup immediately after the commit |
+| withAsyncClean(asyncClean) | hoodie.clean.async | NO | false | Only applies 
when withAutoClean is turned on. When true, turned on cleaner async with 
writing. |
+| archiveCommitsWith(minCommits, maxCommits) | hoodie.keep.min.commits, 
hoodie.keep.max.commits | NO | hoodie.keep.min.commits = 96, 
hoodie.keep.max.commits = 128 | Each commit is a small file in the .hoodie 
directory. Since DFS typically does not favor lots of small files, Hudi 
archives older commits into a sequential log. A commit is published atomically 
by a rename of the commit file. |
+| withCommitsArchivalBatchSize(batch) | hoodie.commits.archival.batch | NO | 
10 | This controls the number of commit instants read in memory as a batch and 
archived together. |
+| compactionSmallFileSize(size) | hoodie.parquet.small.file.limit | NO | 
104857600(100MB) | This should be less < maxFileSize and setting it to 0, turns 
off this feature. Small files can always happen because of the number of insert 
records in a partition in a batch. Hudi has an option to auto-resolve small 
files by masking inserts into this partition as updates to existing small 
files. The size here is the minimum file size considered as a “small file 
size”. |
+| insertSplitSize(size) | hoodie.copyonwrite.insert.split.size | NO | 500000 | 
Insert Write Parallelism. Number of inserts grouped for a single partition. 
Writing out 100MB files, with atleast 1kb records, means 100K records per file. 
Default is to overprovision to 500K. To improve insert latency, tune this to 
match the number of records in a single file. Setting this to a low number, 
will result in small files (particularly when compactionSmallFileSize is 0). |
+| autoTuneInsertSplits(autoSplit) | hoodie.copyonwrite.insert.auto.split | NO 
| true | Should hudi dynamically compute the insertSplitSize based on the last 
24 commit’s metadata. Turned on by default. |
+| approxRecordSize(size) | hoodie.copyonwrite.record.size.estimate | NO | 1024 
| The average record size. If specified, hudi will use this and not compute 
dynamically based on the last 24 commit’s metadata. No value set as default. 
This is critical in computing the insert parallelism and bin-packing inserts 
into small files. See above. |
+| withInlineCompaction(inlineCompaction) | hoodie.compact.inline | NO | false 
| When set to true, compaction is triggered by the ingestion itself, right 
after a commit/deltacommit action as part of insert/upsert/bulk_insert. |
+| withMaxNumDeltaCommitsBeforeCompaction(maxNumDeltaCommitsBeforeCompaction) | 
hoodie.compact.inline.max.delta.commits | NO | 10 | Number of max delta commits 
to keep before triggering an inline compaction. |
+| withCompactionLazyBlockReadEnabled(CompactionLazyBlockRead) | 
hoodie.compaction.lazy.block.read | NO | true | When a CompactedLogScanner 
merges all log files, this config helps to choose whether the logblocks should 
be read lazily or not. Choose true to use I/O intensive lazy block reading (low 
memory usage) or false for Memory intensive immediate block read (high memory 
usage). |
+| withCompactionReverseLogReadEnabled(CompactionReverseLog) | 
hoodie.compaction.reverse.log.read | NO | false | HoodieLogFormatReader reads a 
logfile in the forward direction starting from pos=0 to pos=file_length. If 
this config is set to true, the Reader reads the logfile in reverse direction, 
from pos=file_length to pos=0. |
+| withCleanerParallelism(cleanerParallelism) | hoodie.cleaner.parallelism | NO 
| 200 | Increase this if cleaning becomes slow. |
+| withCompactionStrategy(compactionStrategy) | hoodie.compaction.strategy | NO 
| org.apache.hudi.io.compact.strategy.LogFileSizeBasedCompactionStrategy | 
Compaction strategy decides which file groups are picked up for compaction 
during each compaction run. By default. Hudi picks the log file with most 
accumulated unmerged dataAmount of MBs to spend during compaction run for the 
LogFileSizeBasedCompactionStrategy. This value helps bound ingestion latency 
while compaction is run inline mode. |
+| withTargetIOPerCompactionInMB(targetIOPerCompactionInMB) | 
hoodie.compaction.target.io | NO | 500000 | Amount of MBs to spend during 
compaction run for the LogFileSizeBasedCompactionStrategy. This value helps 
bound ingestion latency while compaction is run inline mode. |
+| withTargetPartitionsPerDayBasedCompaction(targetPartitionsPerCompaction) | 
hoodie.compaction.daybased.target | NO | 10 | Used by 
org.apache.hudi.io.compact.strategy.DayBasedCompactionStrategy to denote the 
number of latest partitions to compact during a compaction run. |
+| withPayloadClass(payloadClassName) | hoodie.compaction.payload.class | NO | 
org.apache.hudi.common.model.HoodieAvroPayload | This needs to be same as class 
used during insert/upserts. Just like writing, compaction also uses the record 
payload class to merge records in the log against each other, merge again with 
the base file and produce the final record to be written after compaction. |
+
+</div>
 
 ### Bootstrap Configs
 Controls bootstrap related configs. If you want to bootstrap your data for the 
first time into hudi, this bootstrap operation will come in handy as you don't 
need to wait for entire data to be loaded into hudi to start leveraging hudi. 
 
 [withBootstrapConfig](#withBootstrapConfig) (HoodieBootstrapConfig) <br/>
 
-#### withBootstrapBasePath(basePath) {#withBootstrapBasePath}
-Property: `hoodie.bootstrap.base.path` <br/>
-<span style="color:grey"> Base path of the dataset that needs to be 
bootstrapped as a Hudi table </span> 
-
-#### withBootstrapParallelism(parallelism = 1500) {#withBootstrapParallelism}
-Property: `hoodie.bootstrap.parallelism` <br/>
-<span style="color:grey"> Parallelism value to be used to bootstrap data into 
hudi </span>
-
-#### withBootstrapKeyGenClass(keyGenClass) (#withBootstrapKeyGenClass)
-Property: `hoodie.bootstrap.keygen.class` <br/>
-<span style="color:grey"> Key generator implementation to be used for 
generating keys from the bootstrapped dataset </span>
-
-#### withBootstrapModeSelector(partitionSelectorClass = 
org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector) 
{#withBootstrapModeSelector}
-Property: `hoodie.bootstrap.mode.selector` <br/>
-<span style="color:grey"> Selects the mode in which each file/partition in the 
bootstrapped dataset gets bootstrapped</span>
-
-#### withBootstrapPartitionPathTranslatorClass(partitionPathTranslatorClass = 
org.apache.hudi.client.bootstrap.translator.IdentityBootstrapPartitionPathTranslator)
 {#withBootstrapPartitionPathTranslatorClass}
-Property: `hoodie.bootstrap.partitionpath.translator.class` <br/>
-<span style="color:grey"> Translates the partition paths from the bootstrapped 
data into how is laid out as a Hudi table. </span>
+<div class="table-wrapper" markdown="block">
 
-#### withFullBootstrapInputProvider(partitionSelectorClass = 
org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider) 
{#withFullBootstrapInputProvider}
-Property: `hoodie.bootstrap.full.input.provider` <br/>
-<span style="color:grey"> Class to use for reading the bootstrap dataset 
partitions/files, for Bootstrap mode `FULL_RECORD` </span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withBootstrapBasePath(basePath) | hoodie.bootstrap.base.path | YES | N/A | 
Base path of the dataset that needs to be bootstrapped as a Hudi table. |
+| withBootstrapParallelism(parallelism) | hoodie.bootstrap.parallelism | NO | 
1500 | Parallelism value to be used to bootstrap data into hudi. |
+| withBootstrapKeyGenClass(keyGenClass)) | hoodie.bootstrap.keygen.class | YES 
| N/A | Key generator implementation to be used for generating keys from the 
bootstrapped dataset. |
+| withBootstrapModeSelector(partitionSelectorClass)) | 
hoodie.bootstrap.mode.selector | NO | 
org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector | 
Bootstap Mode Selector class. By default, Hudi employs METADATA_ONLY boostrap 
for all partitions. |
+| withBootstrapPartitionPathTranslatorClass(partitionPathTranslatorClass) | 
hoodie.bootstrap.partitionpath.translator.class | NO | 
org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector | 
For METADATA_ONLY bootstrap, this class allows customization of partition paths 
used in Hudi target dataset. By default, no customization is done and the 
partition paths reflects what is available in source parquet table. |
+| withFullBootstrapInputProvider(partitionSelectorClass) | 
hoodie.bootstrap.full.input.provider | NO | 
org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider | For FULL_RECORD 
bootstrap, this class use for reading the bootstrap dataset partitions/files 
and provides the input RDD of Hudi records to write. |
+| withBootstrapModeSelectorRegex(regex) | hoodie.bootstrap.mode.selector.regex 
| NO | .* | Partition Regex used when hoodie.bootstrap.mode.selector set to 
BootstrapRegexModeSelector. Matches each bootstrap dataset partition against 
this regex and applies the mode below to it. |
+| withBootstrapModeForRegexMatch(modeForRegexMatch) | 
hoodie.bootstrap.mode.selector.regex.mode | NO | 
org.apache.hudi.client.bootstrap.METADATA_ONLY | Bootstrap Mode used when the 
partition matches the regex pattern in hoodie.bootstrap.mode.selector.regex . 
Used only when hoodie.bootstrap.mode.selector set to 
BootstrapRegexModeSelector. METADATA_ONLY will generate just skeleton base 
files with key
 
-#### withBootstrapModeSelectorRegex(regex = ".*") 
{#withBootstrapModeSelectorRegex}
-Property: `hoodie.bootstrap.mode.selector.regex` <br/>
-<span style="color:grey"> Matches each bootstrap dataset partition against 
this regex and applies the mode below to it. </span>
-
-#### withBootstrapModeForRegexMatch(modeForRegexMatch = 
org.apache.hudi.client.bootstrap.METADATA_ONLY) 
-Property: `withBootstrapModeForRegexMatch` <br/>
-<span style="color:grey"> Bootstrap mode to apply for partition paths, that 
match regex above. `METADATA_ONLY` will generate just skeleton base files
-with keys/footers, avoiding full cost of rewriting the dataset. `FULL_RECORD` 
will perform a full copy/rewrite of the data as a Hudi table. </span>
+</div>
 
 ### Metadata Config
 Configurations used by the HUDI Metadata Table. This table maintains the meta 
information stored in hudi dataset so that listing can be avoided during 
queries. 
 
 [withMetadataConfig](#withMetadataConfig) (HoodieMetadataConfig) <br/>
 
-#### enable(enable = false) {#enable}
-Property: `hoodie.metadata.enable` <br/>
-<span style="color:grey"> Enable the internal Metadata Table which stores 
table level metadata such as file listings </span>
-
-#### enableReuse(enable = true) {#enable}
-Property: `hoodie.metadata.reuse.enable` <br/>
-<span style="color:grey"> Enable reusing of opened file handles/merged logs, 
across multiple fetches from metadata table. </span>
-
-#### enableFallback(enable = true) {#enable}
-Property: `hoodie.metadata.fallback.enable` <br/>
-<span style="color:grey"> Fallback to listing from DFS, if there are any 
errors in fetching from metadata table </span>
-
-#### validate(validate = false) {#validate}
-Property: `hoodie.metadata.validate` <br/>
-<span style="color:grey"> Validate contents of Metadata Table on each access 
against the actual listings from DFS</span>
+<div class="table-wrapper" markdown="block">
 
-#### withInsertParallelism(parallelism = 1) {#withInsertParallelism}
-Property: `hoodie.metadata.insert.parallelism` <br/>
-<span style="color:grey"> Parallelism to use when writing to the metadata 
table </span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| enable(enable) | hoodie.metadata.enable | NO | false | Enable the internal 
Metadata Table which stores table level metadata such as file listings. |
+| enableReuse(enable) | hoodie.metadata.reuse.enable | NO | true | Enable 
reusing of opened file handles/merged logs, across multiple fetches from 
metadata table. |
+| enableFallback(enable) | hoodie.metadata.fallback.enable | NO | true | 
Fallback to listing from DFS, if there are any errors in fetching from metadata 
table. |
+| validate(validate) | hoodie.metadata.validate | NO | false | Validate 
contents of Metadata Table on each access against the actual listings from DFS. 
|
+| withInsertParallelism(parallelism) | hoodie.metadata.insert.parallelism | NO 
| 1 | Parallelism to use when writing to the metadata table. |
+| withMaxNumDeltaCommitsBeforeCompaction(maxNumDeltaCommitsBeforeCompaction) | 
hoodie.metadata.compact.max.delta.commits | NO | 24 | Controls how often the 
metadata table is compacted. |
+| archiveCommitsWith(minToKeep, maxToKeep) | hoodie.metadata.keep.min.commits, 
hoodie.metadata.keep.max.commits | NO | minToKeep = 20, maxToKeep = 30 | 
Controls the archival of the metadata table’s timeline. |
+| withAssumeDatePartitioning(assumeDatePartitioning) | 
hoodie.assume.date.partitioning | NO | false | Should HoodieWriteClient assume 
the data is partitioned by dates, i.e three levels from base path. This is a 
stop-gap to support tables created by versions < 0.3.1. Will be removed 
eventually. |
 
-#### withMaxNumDeltaCommitsBeforeCompaction(maxNumDeltaCommitsBeforeCompaction 
= 24) {#enable}
-Property: `hoodie.metadata.compact.max.delta.commits` <br/>
-<span style="color:grey"> Controls how often the metadata table is 
compacted.</span>
-
-#### archiveCommitsWith(minToKeep = 20, maxToKeep = 30) {#enable}
-Property: `hoodie.metadata.keep.min.commits`, 
`hoodie.metadata.keep.max.commits` <br/>
-<span style="color:grey"> Controls the archival of the metadata table's 
timeline </span>
-
-#### withAssumeDatePartitioning(assumeDatePartitioning = false) 
{#withAssumeDatePartitioning}
-Property: `hoodie.assume.date.partitioning`<br/>
-<span style="color:grey">Should HoodieWriteClient assume the data is 
partitioned by dates, i.e three levels from base path. This is a stop-gap to 
support tables created by versions < 0.3.1. Will be removed eventually </span>
+</div>
 
 ### Clustering Configs
 Controls clustering operations in hudi. Each clustering has to be configured 
for its strategy, and config params. This config drives the same. 
 
 [withClusteringConfig](#withClusteringConfig) (HoodieClusteringConfig) <br/>
 
-#### withClusteringPlanStrategyClass(clusteringStrategyClass = 
org.apache.hudi.client.clustering.plan.strategy.SparkRecentDaysClusteringPlanStrategy)
 {#withClusteringPlanStrategyClass}
-Property: `hoodie.clustering.plan.strategy.class` <br/>
-<span style="color:grey"> Config to provide a strategy class to create 
ClusteringPlan. Class has to be subclass of ClusteringPlanStrategy </span>
-
-#### withClusteringExecutionStrategyClass(runClusteringStrategyClass = 
org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy)
 {#withClusteringExecutionStrategyClass}
-Property: `hoodie.clustering.execution.strategy.class` <br/>
-<span style="color:grey"> Config to provide a strategy class to execute a 
ClusteringPlan. Class has to be subclass of RunClusteringStrategy </span>
-
-#### withClusteringTargetPartitions(clusteringTargetPartitions = 2) 
{#withClusteringTargetPartitions}
-Property: `hoodie.clustering.plan.strategy.daybased.lookback.partitions` <br/>
-<span style="color:grey"> Number of partitions to list to create 
ClusteringPlan </span>
+<div class="table-wrapper" markdown="block">
 
-#### withClusteringPlanSmallFileLimit(clusteringSmallFileLimit = 600Mb) 
{#withClusteringPlanSmallFileLimit}
-Property: `hoodie.clustering.plan.strategy.small.file.limit` <br/>
-<span style="color:grey"> Files smaller than the size specified here are 
candidates for clustering </span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withClusteringPlanStrategyClass(clusteringStrategyClass) | 
hoodie.clustering.plan.strategy.class | NO | 
org.apache.hudi.client.clustering.plan.strategy.SparkRecentDaysClusteringPlanStrategy
 | Config to provide a strategy class to create ClusteringPlan. Class has to be 
subclass of ClusteringPlanStrategy. |
+| withClusteringExecutionStrategyClass(runClusteringStrategyClass) | 
hoodie.clustering.execution.strategy.class | NO | 
org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy
 | Config to provide a strategy class to execute a ClusteringPlan. Class has to 
be subclass of RunClusteringStrategy. |
+| withClusteringTargetPartitions(clusteringTargetPartitions) | 
hoodie.clustering.plan.strategy.daybased.lookback.partitions | NO | 2 | Number 
of partitions to list to create ClusteringPlan. |
+| withClusteringPlanSmallFileLimit(clusteringSmallFileLimit) | 
hoodie.clustering.plan.strategy.small.file.limit | NO | 629145600(600Mb) | 
Files smaller than the size specified here are candidates for clustering. |
+| withClusteringMaxBytesInGroup(clusteringMaxGroupSize) | 
hoodie.clustering.plan.strategy.max.bytes.per.group | NO | 2147483648(2Gb) | 
Max amount of data to be included in one group. Each clustering operation can 
create multiple groups. Total amount of data processed by clustering operation 
is defined by below two properties (CLUSTERING_MAX_BYTES_PER_GROUP * 
CLUSTERING_MAX_NUM_GROUPS). |
+| withClusteringMaxNumGroups(maxNumGroups) | 
hoodie.clustering.plan.strategy.max.num.groups | NO | 30 | Maximum number of 
groups to create as part of ClusteringPlan. Increasing groups will increase 
parallelism. |
+| withClusteringTargetFileMaxBytes(targetFileSize) | 
hoodie.clustering.plan.strategy.target.file.max.bytes | NO | 1073741824(1Gb) | 
Each group can produce ‘N’ 
(CLUSTERING_MAX_GROUP_SIZE/CLUSTERING_TARGET_FILE_SIZE) output file groups. |
 
-#### withClusteringMaxBytesInGroup(clusteringMaxGroupSize = 2Gb) 
{#withClusteringMaxBytesInGroup}
-Property: `hoodie.clustering.plan.strategy.max.bytes.per.group` <br/>
-<span style="color:grey"> Max amount of data to be included in one group
-Each clustering operation can create multiple groups. Total amount of data 
processed by clustering operation is defined by below two properties 
(CLUSTERING_MAX_BYTES_PER_GROUP * CLUSTERING_MAX_NUM_GROUPS). </span>
-
-#### withClusteringMaxNumGroups(maxNumGroups = 30) 
{#withClusteringMaxNumGroups}
-Property : `hoodie.clustering.plan.strategy.max.num.groups` <br/>
-<span style="color:grey"> Maximum number of groups to create as part of 
ClusteringPlan. Increasing groups will increase parallelism. </span>
-
-#### withClusteringTargetFileMaxBytes(targetFileSize = 1Gb ) 
{#withClusteringTargetFileMaxBytes}
-Property: `hoodie.clustering.plan.strategy.target.file.max.bytes` <br/>
-<span style="color:grey"> Each group can produce 'N' 
(CLUSTERING_MAX_GROUP_SIZE/CLUSTERING_TARGET_FILE_SIZE) output file groups 
</span>
+</div>
 
 ### Payload Configs
 Payload related configs. This config can be leveraged by payload 
implementations to determine their business logic. 
 
 [withPayloadConfig](#withPayloadConfig) (HoodiePayloadConfig) <br/>
 
-#### withPayloadOrderingField(payloadOrderingField = "ts") 
{#withPayloadOrderingField}
-Property: `hoodie.payload.ordering.field` <br/>
-<span style="color:grey"> Property to hold the payload ordering field name. 
</span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withPayloadOrderingField(payloadOrderingField) | 
hoodie.payload.ordering.field | NO | ts | Property to hold the payload ordering 
field name. |
+
+</div>
 
 ### Metrics configs
 
@@ -716,99 +447,61 @@ Enables reporting on Hudi metrics.
 
 #### GRAPHITE
 
-##### on(metricsOn = false) {#on}
-`hoodie.metrics.on` <br/>
-<span style="color:grey">Turn on/off metrics reporting. off by default.</span>
+<div class="table-wrapper" markdown="block">
 
-##### withReporterType(reporterType = GRAPHITE) {#withReporterType}
-Property: `hoodie.metrics.reporter.type` <br/>
-<span style="color:grey">Type of metrics reporter.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| on(metricsOn) | hoodie.metrics.on | NO | false | Turn on/off metrics 
reporting. off by default. |
+| withReporterType(reporterType) | hoodie.metrics.reporter.type | NO | 
GRAPHITE | Type of metrics reporter. |
+| toGraphiteHost(host) | hoodie.metrics.graphite.host | NO | localhost | 
Graphite host to connect to. |
+| onGraphitePort(port) | hoodie.metrics.graphite.port | NO | 4756 | Graphite 
port to connect to. |
+| usePrefix(prefix) | hoodie.metrics.graphite.metric.prefix | NO | "" | 
Standard prefix applied to all metrics. This helps to add datacenter, 
environment information |
 
-##### toGraphiteHost(host = localhost) {#toGraphiteHost}
-Property: `hoodie.metrics.graphite.host` <br/>
-<span style="color:grey">Graphite host to connect to</span>
-
-##### onGraphitePort(port = 4756) {#onGraphitePort}
-Property: `hoodie.metrics.graphite.port` <br/>
-<span style="color:grey">Graphite port to connect to</span>
-
-##### usePrefix(prefix = "") {#usePrefix}
-Property: `hoodie.metrics.graphite.metric.prefix` <br/>
-<span style="color:grey">Standard prefix applied to all metrics. This helps to 
add datacenter, environment information for e.g</span>
+</div>
 
 #### JMX
 
-##### on(metricsOn = false) {#on}
-`hoodie.metrics.on` <br/>
-<span style="color:grey">Turn on/off metrics reporting. off by default.</span>
-
-##### withReporterType(reporterType = JMX) {#withReporterType}
-Property: `hoodie.metrics.reporter.type` <br/>
-<span style="color:grey">Type of metrics reporter.</span>
+<div class="table-wrapper" markdown="block">
 
-##### toJmxHost(host = localhost) {#toJmxHost}
-Property: `hoodie.metrics.jmx.host` <br/>
-<span style="color:grey">Jmx host to connect to</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| on(metricsOn) | hoodie.metrics.on | NO | false | Turn on/off metrics 
reporting. off by default. |
+| withReporterType(reporterType) | hoodie.metrics.reporter.type | NO | Here 
use JMX to enable JMX reporter. | Type of metrics reporter. |
+| toJmxHost(host) | hoodie.metrics.jmx.host | NO | localhost | Jmx host to 
connect to. |
+| onJmxPort(port) | hoodie.metrics.jmx.port | NO | 9889 | Jmx port to connect 
to. |
 
-##### onJmxPort(port = 1000-5000) {#onJmxPort}
-Property: `hoodie.metrics.jmx.port` <br/>
-<span style="color:grey">Jmx port to connect to</span>
+</div>
 
 #### DATADOG
 
-##### on(metricsOn = false) {#on}
-`hoodie.metrics.on` <br/>
-<span style="color:grey">Turn on/off metrics reporting. off by default.</span>
+<div class="table-wrapper" markdown="block">
 
-##### withReporterType(reporterType = DATADOG) {#withReporterType}
-Property: `hoodie.metrics.reporter.type` <br/>
-<span style="color:grey">Type of metrics reporter.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| on(metricsOn) | hoodie.metrics.on | NO | false | Turn on/off metrics 
reporting. off by default. |
+| withReporterType(reporterType) | hoodie.metrics.reporter.type | NO | Here 
use DATADOG to enable DATADOG reporter. | Type of metrics reporter. |
+| withDatadogReportPeriodSeconds(period) | 
hoodie.metrics.datadog.report.period.seconds | NO | 30 | Datadog report period 
in seconds. Default to 30. |
+| withDatadogApiSite(apiSite) | hoodie.metrics.datadog.api.site | YES | N/A | 
Choose EU or US. Datadog API site: EU or US |
+| withDatadogApiKeySkipValidation(skip) | 
hoodie.metrics.datadog.api.key.skip.validation | NO | false | Before sending 
metrics via Datadog API, whether to skip validating Datadog API key or not. 
Default to false. |
+| withDatadogApiKey(apiKey) | hoodie.metrics.datadog.api.key | YES if 
apiKeySupplier is not set | N/A | Datadog API key. |
+| withDatadogApiKeySupplier(apiKeySupplier) | 
hoodie.metrics.datadog.api.key.supplier | YES if apiKey is not set  | N/A | 
Datadog API key supplier to supply the API key at runtime. This will take 
effect if hoodie.metrics.datadog.api.key is not set. |
+| withDatadogApiTimeoutSeconds(timeout) | 
hoodie.metrics.datadog.api.timeout.seconds | NO | 3 | Datadog API timeout in 
seconds. Default to 3. |
+| withDatadogPrefix(prefix) | hoodie.metrics.datadog.metric.prefix | NO |  | 
Datadog metric prefix to be prepended to each metric name with a dot as 
delimiter. For example, if it is set to foo, foo. will be prepended. |
+| withDatadogHost(host) | hoodie.metrics.datadog.metric.host | NO |  | Datadog 
metric host to be sent along with metrics data. |
+| withDatadogTags(tags) | hoodie.metrics.datadog.metric.tags | NO |  | Datadog 
metric tags (comma-delimited) to be sent along with metrics data. |
 
-##### withDatadogReportPeriodSeconds(period = 30) 
{#withDatadogReportPeriodSeconds}
-Property: `hoodie.metrics.datadog.report.period.seconds` <br/>
-<span style="color:grey">Datadog report period in seconds. Default to 
30.</span>
-
-##### withDatadogApiSite(apiSite) {#withDatadogApiSite}
-Property: `hoodie.metrics.datadog.api.site` <br/>
-<span style="color:grey">Datadog API site: EU or US</span>
-
-##### withDatadogApiKey(apiKey) {#withDatadogApiKey}
-Property: `hoodie.metrics.datadog.api.key` <br/>
-<span style="color:grey">Datadog API key</span>
-
-##### withDatadogApiKeySkipValidation(skip = false) 
{#withDatadogApiKeySkipValidation}
-Property: `hoodie.metrics.datadog.api.key.skip.validation` <br/>
-<span style="color:grey">Before sending metrics via Datadog API, whether to 
skip validating Datadog API key or not. Default to false.</span>
-
-##### withDatadogApiKeySupplier(apiKeySupplier) {#withDatadogApiKeySupplier}
-Property: `hoodie.metrics.datadog.api.key.supplier` <br/>
-<span style="color:grey">Datadog API key supplier to supply the API key at 
runtime. This will take effect if `hoodie.metrics.datadog.api.key` is not 
set.</span>
-
-##### withDatadogApiTimeoutSeconds(timeout = 3) {#withDatadogApiTimeoutSeconds}
-Property: `hoodie.metrics.datadog.metric.prefix` <br/>
-<span style="color:grey">Datadog API timeout in seconds. Default to 3.</span>
-
-##### withDatadogPrefix(prefix) {#withDatadogPrefix}
-Property: `hoodie.metrics.datadog.metric.prefix` <br/>
-<span style="color:grey">Datadog metric prefix to be prepended to each metric 
name with a dot as delimiter. For example, if it is set to `foo`, `foo.` will 
be prepended.</span>
-
-##### withDatadogHost(host) {#withDatadogHost}
-Property: `hoodie.metrics.datadog.metric.host` <br/>
-<span style="color:grey">Datadog metric host to be sent along with metrics 
data.</span>
-
-##### withDatadogTags(tags) {#withDatadogTags}
-Property: `hoodie.metrics.datadog.metric.tags` <br/>
-<span style="color:grey">Datadog metric tags (comma-delimited) to be sent 
along with metrics data.</span>
+</div>
 
 #### USER DEFINED REPORTER
 
-##### on(metricsOn = false) {#on}
-`hoodie.metrics.on` <br/>
-<span style="color:grey">Turn on/off metrics reporting. off by default.</span>
+<div class="table-wrapper" markdown="block">
 
-##### withReporterClass(className = "") {#withReporterClass}
-Property: `hoodie.metrics.reporter.class` <br/>
-<span style="color:grey">User-defined class used to report metrics, must be a 
subclass of AbstractUserDefinedMetricsReporter.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| on(metricsOn) | hoodie.metrics.on | NO | false | Turn on/off metrics 
reporting. off by default. |
+| withReporterClass(className) | hoodie.metrics.reporter.class | NO | "" | 
User-defined class used to report metrics, must be a subclass of 
AbstractUserDefinedMetricsReporter. |
+
+</div>
 
 
 ### Memory configs
@@ -816,128 +509,81 @@ Controls memory usage for compaction and merges, 
performed internally by Hudi
 [withMemoryConfig](#withMemoryConfig) (HoodieMemoryConfig) <br/>
 <span style="color:grey">Memory related configs</span>
 
-#### withMaxMemoryFractionPerPartitionMerge(maxMemoryFractionPerPartitionMerge 
= 0.6) {#withMaxMemoryFractionPerPartitionMerge} 
-Property: `hoodie.memory.merge.fraction` <br/>
-<span style="color:grey">This fraction is multiplied with the user memory 
fraction (1 - spark.memory.fraction) to get a final fraction of heap space to 
use during merge </span>
+<div class="table-wrapper" markdown="block">
 
-#### withMaxMemorySizePerCompactionInBytes(maxMemorySizePerCompactionInBytes = 
1GB) {#withMaxMemorySizePerCompactionInBytes} 
-Property: `hoodie.memory.compaction.fraction` <br/>
-<span style="color:grey">HoodieCompactedLogScanner reads logblocks, converts 
records to HoodieRecords and then merges these log blocks and records. At any 
point, the number of entries in a log block can be less than or equal to the 
number of entries in the corresponding parquet file. This can lead to OOM in 
the Scanner. Hence, a spillable map helps alleviate the memory pressure. Use 
this config to set the max allowable inMemory footprint of the spillable 
map.</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withMaxMemoryFractionPerPartitionMerge(maxMemoryFractionPerPartitionMerge) | 
hoodie.memory.merge.fraction | NO | 0.6 | This fraction is multiplied with the 
user memory fraction (1 - spark.memory.fraction) to get a final fraction of 
heap space to use during merge. |
+| withMaxMemorySizePerCompactionInBytes(maxMemorySizePerCompactionInBytes) | 
hoodie.memory.compaction.fraction | NO | 1073741824(1Gb) | 
HoodieCompactedLogScanner reads logblocks, converts records to HoodieRecords 
and then merges these log blocks and records. At any point, the number of 
entries in a log block can be less than or equal to the number of entries in 
the corresponding parquet file. This can lead to OOM in the Scanner. Hence, a 
spillable map helps alleviate the memory pressure. [...]
+| withWriteStatusFailureFraction(failureFraction) | 
hoodie.memory.writestatus.failure.fraction | NO | 0.1 | This property controls 
what fraction of the failed record, exceptions we report back to driver. |
 
-#### withWriteStatusFailureFraction(failureFraction = 0.1) 
{#withWriteStatusFailureFraction}
-Property: `hoodie.memory.writestatus.failure.fraction` <br/>
-<span style="color:grey">This property controls what fraction of the failed 
record, exceptions we report back to driver</span>
+</div>
 
 ### Write commit callback configs
 Controls callback behavior on write commit. Exception will be thrown if user 
enabled the callback service and errors occurred during the process of 
callback. Currently support HTTP, Kafka type. 
 [withCallbackConfig](#withCallbackConfig) (HoodieWriteCommitCallbackConfig) 
<br/>
 <span style="color:grey">Callback related configs</span>
 
-##### writeCommitCallbackOn(callbackOn = false) {#writeCommitCallbackOn} 
-Property: `hoodie.write.commit.callback.on` <br/>
-<span style="color:grey">Turn callback on/off. off by default.</span>
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| writeCommitCallbackOn(callbackOn) | hoodie.write.commit.callback.on | NO | 
false | Turn callback on/off. off by default. |
+| withCallbackClass(callbackClass) | hoodie.write.commit.callback.class | NO | 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback | Full path of 
callback class and must be a subclass of HoodieWriteCommitCallback class, 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by default. |
 
-##### withCallbackClass(callbackClass) {#withCallbackClass} 
-Property: `hoodie.write.commit.callback.class` <br/>
-<span style="color:grey">Full path of callback class and must be a subclass of 
HoodieWriteCommitCallback class, 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by default</span>
+</div>
 
 #### HTTP CALLBACK
 Callback via HTTP, User does not need to specify this way explicitly, it is 
the default type.
 
 ##### withCallbackHttpUrl(url) {#withCallbackHttpUrl} 
-Property: `hoodie.write.commit.callback.http.url` <br/>
-<span style="color:grey">Callback host to be sent along with callback 
messages</span>
 
-##### withCallbackHttpTimeoutSeconds(timeoutSeconds = 3) 
{#withCallbackHttpTimeoutSeconds} 
-Property: `hoodie.write.commit.callback.http.timeout.seconds` <br/>
-<span style="color:grey">Callback timeout in seconds. 3 by default</span>
+<div class="table-wrapper" markdown="block">
 
-##### withCallbackHttpApiKey(apiKey) {#withCallbackHttpApiKey} 
-Property: `hoodie.write.commit.callback.http.api.key` <br/>
-<span style="color:grey">Http callback API key. 
hudi_write_commit_http_callback by default</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withCallbackHttpUrl(url) | hoodie.write.commit.callback.http.url | YES | N/A 
| Callback host to be sent along with callback messages. |
+| withCallbackHttpTimeoutSeconds(timeoutSeconds) | 
hoodie.write.commit.callback.http.timeout.seconds | NO | 3 | Callback timeout 
in seconds. 3 by default. |
+| withCallbackHttpApiKey(apiKey) | hoodie.write.commit.callback.http.api.key | 
NO | hudi_write_commit_http_callback | Http callback API key. 
hudi_write_commit_http_callback by default. |
+
+</div>
 
 #### KAFKA CALLBACK
 To use kafka callback, User should set `hoodie.write.commit.callback.class` = 
`org.apache.hudi.utilities.callback.kafka.HoodieWriteCommitKafkaCallback`
 
-##### CALLBACK_KAFKA_BOOTSTRAP_SERVERS
-Property: `hoodie.write.commit.callback.kafka.bootstrap.servers` <br/>
-<span style="color:grey">Bootstrap servers of kafka callback cluster</span>
-
-##### CALLBACK_KAFKA_TOPIC
-Property: `hoodie.write.commit.callback.kafka.topic` <br/>
-<span style="color:grey">Kafka topic to be sent along with callback 
messages</span>
+<div class="table-wrapper" markdown="block">
 
-##### CALLBACK_KAFKA_PARTITION
-Property: `hoodie.write.commit.callback.kafka.partition` <br/>
-<span style="color:grey">partition of `CALLBACK_KAFKA_TOPIC`, 0 by 
default</span>
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| CALLBACK_KAFKA_BOOTSTRAP_SERVERS | 
hoodie.write.commit.callback.kafka.bootstrap.servers | YES | N/A | Bootstrap 
servers of kafka callback cluster. |
+| CALLBACK_KAFKA_TOPIC | hoodie.write.commit.callback.kafka.topic | YES | N/A 
| Kafka topic to be sent along with callback messages. |
+| CALLBACK_KAFKA_PARTITION | hoodie.write.commit.callback.kafka.partition | NO 
| 0 | partition of CALLBACK_KAFKA_TOPIC, 0 by default. |
+| CALLBACK_KAFKA_ACKS | hoodie.write.commit.callback.kafka.acks | NO | All | 
kafka acks level, all by default. |
+| CALLBACK_KAFKA_RETRIES | hoodie.write.commit.callback.kafka.retries | NO | 3 
| Times to retry. 3 by default. |
 
-##### CALLBACK_KAFKA_ACKS
-Property: `hoodie.write.commit.callback.kafka.acks` <br/>
-<span style="color:grey">kafka acks level, `all` by default</span>
-
-##### CALLBACK_KAFKA_RETRIES
-Property: `hoodie.write.commit.callback.kafka.retries` <br/>
-<span style="color:grey">Times to retry. 3 by default</span>
+</div>
 
 ### Locking configs
 Configs that control locking mechanisms if 
[WriteConcurrencyMode=optimistic_concurrency_control](#WriteConcurrencyMode) is 
enabled
 [withLockConfig](#withLockConfig) (HoodieLockConfig) <br/>
 
-#### withLockProvider(lockProvider = 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) 
{#withLockProvider}
-Property: `hoodie.write.lock.provider` <br/>
-<span style="color:grey">Lock provider class name, user can provide their own 
implementation of LockProvider which should be subclass of 
org.apache.hudi.common.lock.LockProvider</span>
-
-#### withZkQuorum(zkQuorum) {#withZkQuorum}
-Property: `hoodie.write.lock.zookeeper.url` <br/>
-<span style="color:grey">Set the list of comma separated servers to connect 
to</span>
-
-#### withZkBasePath(zkBasePath) {#withZkBasePath}
-Property: `hoodie.write.lock.zookeeper.base_path` [Required] <br/>
-<span style="color:grey">The base path on Zookeeper under which to create a 
ZNode to acquire the lock. This should be common for all jobs writing to the 
same table</span>
-
-#### withZkPort(zkPort) {#withZkPort}
-Property: `hoodie.write.lock.zookeeper.port` [Required] <br/>
-<span style="color:grey">The connection port to be used for Zookeeper</span>
-
-#### withZkLockKey(zkLockKey) {#withZkLockKey}
-Property: `hoodie.write.lock.zookeeper.lock_key` [Required] <br/>
-<span style="color:grey">Key name under base_path at which to create a ZNode 
and acquire lock. Final path on zk will look like base_path/lock_key. We 
recommend setting this to the table name</span>
-
-#### withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000) 
{#withZkConnectionTimeoutInMs}
-Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` <br/>
-<span style="color:grey">How long to wait when connecting to ZooKeeper before 
considering the connection a failure</span>
-
-#### withZkSessionTimeoutInMs(sessionTimeoutInMs = 60000) 
{#withZkSessionTimeoutInMs}
-Property: `hoodie.write.lock.zookeeper.session_timeout_ms` <br/>
-<span style="color:grey">How long to wait after losing a connection to 
ZooKeeper before the session is expired</span>
-
-#### withNumRetries(num_retries = 3) {#withNumRetries}
-Property: `hoodie.write.lock.num_retries` <br/>
-<span style="color:grey">Maximum number of times to retry by lock provider 
client</span>
-
-#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000) 
{#withRetryWaitTimeInMillis}
-Property: `hoodie.write.lock.wait_time_ms_between_retry` <br/>
-<span style="color:grey">Initial amount of time to wait between retries by 
lock provider client</span>
-
-#### withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName}
-Property: `hoodie.write.lock.hivemetastore.database` [Required] <br/>
-<span style="color:grey">The Hive database to acquire lock against</span>
-
-#### withHiveTableName(hiveTableName) {#withHiveTableName}
-Property: `hoodie.write.lock.hivemetastore.table` [Required] <br/>
-<span style="color:grey">The Hive table under the hive database to acquire 
lock against</span>
-
-#### withClientNumRetries(clientNumRetries = 0) {#withClientNumRetries}
-Property: `hoodie.write.lock.client.num_retries` <br/>
-<span style="color:grey">Maximum number of times to retry to acquire lock 
additionally from the hudi client</span>
-
-#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 10000) 
{#withRetryWaitTimeInMillis}
-Property: `hoodie.write.lock.client.wait_time_ms_between_retry` <br/>
-<span style="color:grey">Amount of time to wait between retries from the hudi 
client</span>
-
-#### withConflictResolutionStrategy(lockProvider = 
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy)
 {#withConflictResolutionStrategy}
-Property: `hoodie.write.lock.conflict.resolution.strategy` <br/>
-<span style="color:grey">Lock provider class name, this should be subclass of 
org.apache.hudi.client.transaction.ConflictResolutionStrategy</span>
-
-
-
+<div class="table-wrapper" markdown="block">
+
+|  Option Name  | Property | Required | Default | Remarks |
+|  -----------  | -------- | -------- | ------- | ------- |
+| withLockProvider(lockProvider) | hoodie.write.lock.provider | NO | 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider | Lock 
provider class name, user can provide their own implementation of LockProvider 
which should be subclass of org.apache.hudi.common.lock.LockProvider. |
+| withZkQuorum(zkQuorum) | hoodie.write.lock.provider | NO |  | Set the list 
of comma separated servers to connect to. |
+| withZkBasePath(zkBasePath) | hoodie.write.lock.zookeeper.base_path | YES | 
N/A | The base path on Zookeeper under which to create a ZNode to acquire the 
lock. This should be common for all jobs writing to the same table. |
+| withZkPort(zkPort) | hoodie.write.lock.zookeeper.port | YES | N/A | The 
connection port to be used for Zookeeper. |
+| withZkLockKey(zkLockKey) | hoodie.write.lock.zookeeper.lock_key | YES | N/A 
| Key name under base_path at which to create a ZNode and acquire lock. Final 
path on zk will look like base_path/lock_key. We recommend setting this to the 
table name. |
+| withZkConnectionTimeoutInMs(connectionTimeoutInMs) | 
hoodie.write.lock.zookeeper.connection_timeout_ms | NO | 15000 | How long to 
wait when connecting to ZooKeeper before considering the connection a failure. |
+| withZkSessionTimeoutInMs(sessionTimeoutInMs) | 
hoodie.write.lock.zookeeper.session_timeout_ms | NO | 60000 | How long to wait 
after losing a connection to ZooKeeper before the session is expired. |
+| withNumRetries(num_retries) | hoodie.write.lock.num_retries | NO | 3 | 
Maximum number of times to retry by lock provider client. |
+| withRetryWaitTimeInMillis(retryWaitTimeInMillis) | 
hoodie.write.lock.wait_time_ms_between_retry | NO | 5000 | Initial amount of 
time to wait between retries by lock provider client. |
+| withHiveDatabaseName(hiveDatabaseName) | 
hoodie.write.lock.hivemetastore.database | YES | N/A | The Hive database to 
acquire lock against. |
+| withHiveTableName(hiveTableName) | hoodie.write.lock.hivemetastore.table | 
YES | N/A | The Hive table under the hive database to acquire lock against. |
+| withClientNumRetries(clientNumRetries) | 
hoodie.write.lock.client.num_retries | NO | 0 | Maximum number of times to 
retry to acquire lock additionally from the hudi client. |
+| withRetryWaitTimeInMillis(retryWaitTimeInMillis) | 
hoodie.write.lock.client.wait_time_ms_between_retry | NO | 10000 | Amount of 
time to wait between retries from the hudi client. |
+| withConflictResolutionStrategy(lockProvider) | 
hoodie.write.lock.conflict.resolution.strategy | NO | 
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy
 | Lock provider class name, this should be subclass of 
org.apache.hudi.client.transaction.ConflictResolutionStrategy. |
+
+</div>
diff --git a/docs/_sass/hudi_style/_tables.scss 
b/docs/_sass/hudi_style/_tables.scss
index e40b16b..0c835e2 100644
--- a/docs/_sass/hudi_style/_tables.scss
+++ b/docs/_sass/hudi_style/_tables.scss
@@ -35,4 +35,8 @@ tr,
 td,
 th {
   vertical-align: middle;
+}
+
+.table-wrapper {
+  overflow-x: scroll;
 }
\ No newline at end of file

[hudi] branch asf-site updated: [Docs] Improving Hudi Configurations docs (#3145)

Reply via email to