[hudi] branch asf-site updated: [HUDI-2378] Add common and pre validate configs to website (#3565)

sivabalan Mon, 30 Aug 2021 21:51:12 -0700

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 8925a3b  [HUDI-2378] Add common and pre validate configs to website 
(#3565)
8925a3b is described below

commit 8925a3bc4dc1b37ef1d796761cad69e295c18ef4
Author: rmahindra123 <[email protected]>
AuthorDate: Mon Aug 30 21:49:37 2021 -0700

    [HUDI-2378] Add common and pre validate configs to website (#3565)
    
    Co-authored-by: rmahindra123 <[email protected]>
---
 website/docs/configurations.md                     | 54 ++++++++++++++++++-
 .../versioned_docs/version-0.9.0/configurations.md | 63 +++++++++++++++++++++-
 2 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index a5db0a6..97b51a1 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -4,7 +4,7 @@ keywords: [ configurations, default, flink options, spark, 
configs, parameters ]
 permalink: /docs/configurations.html
 summary: This page covers the different ways of configuring your job to 
write/read Hudi tables. At a high level, you can control behaviour at few 
levels.
 toc: true
-last_modified_at: 2021-08-26T22:21:44.177
+last_modified_at: 2021-08-30T20:08:15.950513
 ---
 
 This page covers the different ways of configuring your job to write/read Hudi 
tables. At a high level, you can control behaviour at few levels.
@@ -472,6 +472,39 @@ By default false (the names of partition folders are only 
partition values)<br><
 
 ---
 
+### PreCommit Validator Configurations {#PreCommit-Validator-Configurations}
+
+The following set of configurations help validate new data before commits.
+
+`Config Class`: org.apache.hudi.config.HoodiePreCommitValidatorConfig<br></br>
+> #### hoodie.precommit.validators.single.value.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state after commit.Multiple queries separated by ';' delimiter are 
supported.Expected result is included as part of query separated by '#'. 
Example query: 'query1#result1:query2#result2'Note \<TABLE_NAME\> variable is 
expected to be present in query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: SINGLE_VALUE_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.equality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state before and after commit. Multiple queries separated by ';' delimiter are 
supported. Example: "select count(*) from \<TABLE_NAME\> Note \<TABLE_NAME\> is 
replaced by table state before and after commit.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: EQUALITY_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators
+> Comma separated list of class names that can be invoked to validate 
commit<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: VALIDATOR_CLASS_NAMES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.inequality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state before and after commit.Multiple queries separated by ';' delimiter are 
supported.Example query: 'select count(*) from \<TABLE_NAME\> where 
col=null'Note \<TABLE_NAME\> variable is expected to be present in 
query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INEQUALITY_SQL_QUERIES`<br></br>
+
+---
+
 ## Flink Sql Configs {#FLINK_SQL}
 These configs control the Hudi Flink SQL source/sink connectors, providing 
ability to define record keys, pick out the write operation, specify how to 
merge records, enable/disable asynchronous compaction or choosing query type to 
read.
 
@@ -2758,6 +2791,25 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
+### Common Configurations {#Common-Configurations}
+
+The following set of configurations are common across Hudi.
+
+`Config Class`: org.apache.hudi.common.config.HoodieCommonConfig<br></br>
+> #### hoodie.common.diskmap.compression.enabled
+> Turn on compression for BITCASK disk map used by the External Spillable 
Map<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: DISK_MAP_BITCASK_COMPRESSION_ENABLED`<br></br>
+
+---
+
+> #### hoodie.common.spillable.diskmap.type
+> When handling input data that cannot be held in memory, to merge with a file 
on storage, a spillable diskmap is employed.  By default, we use a persistent 
hashmap based loosely on bitcask, that offers O(1) inserts, lookups. Change 
this to `ROCKS_DB` to prefer using rocksDB, for handling the spill.<br></br>
+> **Default Value**: BITCASK (Optional)<br></br>
+> `Config Param: SPILLABLE_DISK_MAP_TYPE`<br></br>
+
+---
+
 ### Metadata Configs {#Metadata-Configs}
 
 Configurations used by the Hudi Metadata Table. This table maintains the 
metadata about a given Hudi table (e.g file listings)  to avoid overhead of 
accessing cloud storage, during queries.
diff --git a/website/versioned_docs/version-0.9.0/configurations.md 
b/website/versioned_docs/version-0.9.0/configurations.md
index 31e5f9f..97b51a1 100644
--- a/website/versioned_docs/version-0.9.0/configurations.md
+++ b/website/versioned_docs/version-0.9.0/configurations.md
@@ -4,7 +4,7 @@ keywords: [ configurations, default, flink options, spark, 
configs, parameters ]
 permalink: /docs/configurations.html
 summary: This page covers the different ways of configuring your job to 
write/read Hudi tables. At a high level, you can control behaviour at few 
levels.
 toc: true
-last_modified_at: 2021-08-26T22:04:13.167
+last_modified_at: 2021-08-30T20:08:15.950513
 ---
 
 This page covers the different ways of configuring your job to write/read Hudi 
tables. At a high level, you can control behaviour at few levels.
@@ -472,6 +472,39 @@ By default false (the names of partition folders are only 
partition values)<br><
 
 ---
 
+### PreCommit Validator Configurations {#PreCommit-Validator-Configurations}
+
+The following set of configurations help validate new data before commits.
+
+`Config Class`: org.apache.hudi.config.HoodiePreCommitValidatorConfig<br></br>
+> #### hoodie.precommit.validators.single.value.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state after commit.Multiple queries separated by ';' delimiter are 
supported.Expected result is included as part of query separated by '#'. 
Example query: 'query1#result1:query2#result2'Note \<TABLE_NAME\> variable is 
expected to be present in query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: SINGLE_VALUE_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.equality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state before and after commit. Multiple queries separated by ';' delimiter are 
supported. Example: "select count(*) from \<TABLE_NAME\> Note \<TABLE_NAME\> is 
replaced by table state before and after commit.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: EQUALITY_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators
+> Comma separated list of class names that can be invoked to validate 
commit<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: VALIDATOR_CLASS_NAMES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.inequality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate 
state before and after commit.Multiple queries separated by ';' delimiter are 
supported.Example query: 'select count(*) from \<TABLE_NAME\> where 
col=null'Note \<TABLE_NAME\> variable is expected to be present in 
query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INEQUALITY_SQL_QUERIES`<br></br>
+
+---
+
 ## Flink Sql Configs {#FLINK_SQL}
 These configs control the Hudi Flink SQL source/sink connectors, providing 
ability to define record keys, pick out the write operation, specify how to 
merge records, enable/disable asynchronous compaction or choosing query type to 
read.
 
@@ -1217,7 +1250,7 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 ---
 
 > #### hoodie.bulkinsert.user.defined.partitioner.class
-> If specified, this class will be used to re-partition records before they 
are bulk inserted. This can be used to sort, pack, cluster data optimally for 
common query patterns.<br></br>
+> If specified, this class will be used to re-partition records before they 
are bulk inserted. This can be used to sort, pack, cluster data optimally for 
common query patterns. For now we support a build-in user defined bulkinsert 
partitioner 
org.apache.hudi.execution.bulkinsert.RDDCustomColumnsSortPartitioner which can 
does sorting based on specified column values set by 
hoodie.bulkinsert.user.defined.partitioner.sort.columns<br></br>
 > **Default Value**: N/A (Required)<br></br>
 > `Config Param: BULKINSERT_USER_DEFINED_PARTITIONER_CLASS_NAME`<br></br>
 
@@ -1436,6 +1469,13 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 
 ---
 
+> #### hoodie.bulkinsert.user.defined.partitioner.sort.columns
+> Columns to sort the data by when use 
org.apache.hudi.execution.bulkinsert.RDDCustomColumnsSortPartitioner as user 
defined partitioner during bulk_insert. For example 'column1,column2'<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BULKINSERT_USER_DEFINED_PARTITIONER_SORT_COLUMNS`<br></br>
+
+---
+
 > #### hoodie.finalize.write.parallelism
 > Parallelism for the write finalization internal operation, which involves 
 > removing any partially written files from lake storage, before committing 
 > the write. Reduce this value, if the high number of tasks incur delays for 
 > smaller tables or low latency writes.<br></br>
 > **Default Value**: 1500 (Optional)<br></br>
@@ -2751,6 +2791,25 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
+### Common Configurations {#Common-Configurations}
+
+The following set of configurations are common across Hudi.
+
+`Config Class`: org.apache.hudi.common.config.HoodieCommonConfig<br></br>
+> #### hoodie.common.diskmap.compression.enabled
+> Turn on compression for BITCASK disk map used by the External Spillable 
Map<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: DISK_MAP_BITCASK_COMPRESSION_ENABLED`<br></br>
+
+---
+
+> #### hoodie.common.spillable.diskmap.type
+> When handling input data that cannot be held in memory, to merge with a file 
on storage, a spillable diskmap is employed.  By default, we use a persistent 
hashmap based loosely on bitcask, that offers O(1) inserts, lookups. Change 
this to `ROCKS_DB` to prefer using rocksDB, for handling the spill.<br></br>
+> **Default Value**: BITCASK (Optional)<br></br>
+> `Config Param: SPILLABLE_DISK_MAP_TYPE`<br></br>
+
+---
+
 ### Metadata Configs {#Metadata-Configs}
 
 Configurations used by the Hudi Metadata Table. This table maintains the 
metadata about a given Hudi table (e.g file listings)  to avoid overhead of 
accessing cloud storage, during queries.

[hudi] branch asf-site updated: [HUDI-2378] Add common and pre validate configs to website (#3565)

Reply via email to