(hudi) branch asf-site updated: [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058)

danny0405 Fri, 19 Apr 2024 17:44:42 -0700

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new e3894931de4 [DOCS] Added configurations of Hudi table, file-based SQL 
source, Hudi error table, and timestamp key generator to configuration listing 
(#11058)
e3894931de4 is described below

commit e3894931de489f222730972d76f783ffd67cccac
Author: Geser Dugarov <[email protected]>
AuthorDate: Sat Apr 20 07:44:31 2024 +0700

    [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi 
error table, and timestamp key generator to configuration listing (#11058)
---
 website/docs/basic_configurations.md |  91 ++++++++++++++++++++++++-
 website/docs/configurations.md       | 125 ++++++++++++++++++++++++++++++++++-
 2 files changed, 214 insertions(+), 2 deletions(-)

diff --git a/website/docs/basic_configurations.md 
b/website/docs/basic_configurations.md
index 2f18ad3e885..1fc301521e1 100644
--- a/website/docs/basic_configurations.md
+++ b/website/docs/basic_configurations.md
@@ -1,12 +1,13 @@
 ---
 title: Basic Configurations
 summary: This page covers the basic configurations you may use to write/read 
Hudi tables. This page only features a subset of the most frequently used 
configurations. For a full list of all configs, please visit the [All 
Configurations](/docs/configurations) page.
-last_modified_at: 2024-04-15T09:56:05.413
+last_modified_at: 2024-04-19T18:21:42.88
 ---
 
 
 This page covers the basic configurations you may use to write/read Hudi 
tables. This page only features a subset of the most frequently used 
configurations. For a full list of all configs, please visit the [All 
Configurations](/docs/configurations) page.
 
+- [**Hudi Table Config**](#TABLE_CONFIG): Basic Hudi Table configuration 
parameters.
 - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the 
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out 
the write operation, specify how to merge records or choosing query type to 
read.
 - [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink 
SQL source/sink connectors, providing ability to define record keys, pick out 
the write operation, specify how to merge records, enable/disable asynchronous 
compaction or choosing query type to read.
 - [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource 
uses a RDD based HoodieWriteClient API to actually perform writes to storage. 
These configs provide deep control over lower level aspects like file sizing, 
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi 
provides sane defaults, from time-time these configs may need to be tweaked to 
optimize for specific workloads.
@@ -20,6 +21,56 @@ This page covers the basic configurations you may use to 
write/read Hudi tables.
 In the tables below **(N/A)** means there is no default value set
 :::
 
+## Hudi Table Config {#TABLE_CONFIG}
+Basic Hudi Table configuration parameters.
+
+
+### Hudi Table Basic Configs {#Hudi-Table-Basic-Configs}
+Configurations of the Hudi Table like type of ingestion, storage formats, hive 
table name etc. Configurations are loaded from hoodie.properties, these 
properties are usually set during initializing a path as hoodie base path and 
never changes during the lifetime of a hoodie table.
+
+
+
+
+[**Basic Configs**](#Hudi-Table-Basic-Configs-basic-configs)
+
+
+| Config Name                                                                  
                    | Default                                                   
  | Description                                                                 
                                                                                
                                                                                
                                                                                
              [...]
+| 
------------------------------------------------------------------------------------------------
 | ----------------------------------------------------------- | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| [hoodie.bootstrap.base.path](#hoodiebootstrapbasepath)                       
                    | (N/A)                                                     
  | Base path of the dataset that needs to be bootstrapped as a Hudi table<br 
/>`Config Param: BOOTSTRAP_BASE_PATH`                                           
                                                                                
                                                                                
                [...]
+| [hoodie.database.name](#hoodiedatabasename)                                  
                    | (N/A)                                                     
  | Database name that will be used for incremental query.If different 
databases have the same table name during incremental query, we can set it to 
limit the table name under a specific database<br />`Config Param: 
DATABASE_NAME`                                                                  
                                      [...]
+| [hoodie.table.checksum](#hoodietablechecksum)                                
                    | (N/A)                                                     
  | Table checksum is used to guard against partial writes in HDFS. It is added 
as the last entry in hoodie.properties and then used to validate while reading 
table config.<br />`Config Param: TABLE_CHECKSUM`<br />`Since Version: 0.11.0`  
                                                                                
               [...]
+| [hoodie.table.create.schema](#hoodietablecreateschema)                       
                    | (N/A)                                                     
  | Schema used when creating the table, for the first time.<br />`Config 
Param: CREATE_SCHEMA`                                                           
                                                                                
                                                                                
                    [...]
+| [hoodie.table.index.defs.path](#hoodietableindexdefspath)                    
                    | (N/A)                                                     
  | Absolute path where the index definitions are stored<br />`Config Param: 
INDEX_DEFINITION_PATH`<br />`Since Version: 1.0.0`                              
                                                                                
                                                                                
                 [...]
+| [hoodie.table.keygenerator.class](#hoodietablekeygeneratorclass)             
                    | (N/A)                                                     
  | Key Generator class property for the hoodie table<br />`Config Param: 
KEY_GENERATOR_CLASS_NAME`                                                       
                                                                                
                                                                                
                    [...]
+| [hoodie.table.keygenerator.type](#hoodietablekeygeneratortype)               
                    | (N/A)                                                     
  | Key Generator type to determine key generator class<br />`Config Param: 
KEY_GENERATOR_TYPE`<br />`Since Version: 1.0.0`                                 
                                                                                
                                                                                
                  [...]
+| [hoodie.table.metadata.partitions](#hoodietablemetadatapartitions)           
                    | (N/A)                                                     
  | Comma-separated list of metadata partitions that have been completely built 
and in-sync with data table. These partitions are ready for use by the 
readers<br />`Config Param: TABLE_METADATA_PARTITIONS`<br />`Since Version: 
0.11.0`                                                                         
                           [...]
+| 
[hoodie.table.metadata.partitions.inflight](#hoodietablemetadatapartitionsinflight)
              | (N/A)                                                       | 
Comma-separated list of metadata partitions whose building is in progress. 
These partitions are not yet ready for use by the readers.<br />`Config Param: 
TABLE_METADATA_PARTITIONS_INFLIGHT`<br />`Since Version: 0.11.0`                
                                                                                
                [...]
+| [hoodie.table.name](#hoodietablename)                                        
                    | (N/A)                                                     
  | Table name that will be used for registering with Hive. Needs to be same 
across runs.<br />`Config Param: NAME`                                          
                                                                                
                                                                                
                 [...]
+| [hoodie.table.partition.fields](#hoodietablepartitionfields)                 
                    | (N/A)                                                     
  | Fields used to partition the table. Concatenated values of these fields are 
used as the partition path, by invoking toString()<br />`Config Param: 
PARTITION_FIELDS`                                                               
                                                                                
                       [...]
+| [hoodie.table.precombine.field](#hoodietableprecombinefield)                 
                    | (N/A)                                                     
  | Field used in preCombining before actual write. By default, when two 
records have the same key value, the largest value for the precombine field 
determined by Object.compareTo(..), is picked.<br />`Config Param: 
PRECOMBINE_FIELD`                                                               
                                      [...]
+| [hoodie.table.recordkey.fields](#hoodietablerecordkeyfields)                 
                    | (N/A)                                                     
  | Columns used to uniquely identify the table. Concatenated values of these 
fields are used as  the record key component of HoodieKey.<br />`Config Param: 
RECORDKEY_FIELDS`                                                               
                                                                                
                 [...]
+| 
[hoodie.table.secondary.indexes.metadata](#hoodietablesecondaryindexesmetadata) 
                 | (N/A)                                                       
| The metadata of secondary indexes<br />`Config Param: 
SECONDARY_INDEXES_METADATA`<br />`Since Version: 0.13.0`                        
                                                                                
                                                                                
                                    [...]
+| [hoodie.timeline.layout.version](#hoodietimelinelayoutversion)               
                    | (N/A)                                                     
  | Version of timeline used, by the table.<br />`Config Param: 
TIMELINE_LAYOUT_VERSION`                                                        
                                                                                
                                                                                
                              [...]
+| [hoodie.archivelog.folder](#hoodiearchivelogfolder)                          
                    | archived                                                  
  | path under the meta folder, to store archived timeline instants at.<br 
/>`Config Param: ARCHIVELOG_FOLDER`                                             
                                                                                
                                                                                
                   [...]
+| [hoodie.bootstrap.index.class](#hoodiebootstrapindexclass)                   
                    | 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex  | Implementation to 
use, for mapping base files to bootstrap base file, that contain actual 
data.<br />`Config Param: BOOTSTRAP_INDEX_CLASS_NAME`                           
                                                                                
                                                                                
[...]
+| [hoodie.bootstrap.index.enable](#hoodiebootstrapindexenable)                 
                    | true                                                      
  | Whether or not, this is a bootstrapped table, with bootstrap base data and 
an mapping index defined, default true.<br />`Config Param: 
BOOTSTRAP_INDEX_ENABLE`                                                         
                                                                                
                                   [...]
+| [hoodie.bootstrap.index.type](#hoodiebootstrapindextype)                     
                    | HFILE                                                     
  | Bootstrap index type determines which implementation to use, for mapping 
base files to bootstrap base file, that contain actual data.<br />`Config 
Param: BOOTSTRAP_INDEX_TYPE`<br />`Since Version: 1.0.0`                        
                                                                                
                       [...]
+| [hoodie.compaction.payload.class](#hoodiecompactionpayloadclass)             
                    | org.apache.hudi.common.model.DefaultHoodieRecordPayload   
  | Payload class to use for performing compactions, i.e merge delta logs with 
current base file and then  produce a new base file.<br />`Config Param: 
PAYLOAD_CLASS_NAME`                                                             
                                                                                
                      [...]
+| [hoodie.compaction.payload.type](#hoodiecompactionpayloadtype)               
                    | HOODIE_AVRO_DEFAULT                                       
  | org.apache.hudi.common.model.RecordPayloadType: Payload to use for merging 
records     AWS_DMS_AVRO: Provides support for seamlessly applying changes 
captured via Amazon Database Migration Service onto S3.     HOODIE_AVRO: A 
payload to wrap a existing Hoodie Avro Record. Useful to create a HoodieRecord 
over existing GenericReco [...]
+| 
[hoodie.compaction.record.merger.strategy](#hoodiecompactionrecordmergerstrategy)
                | eeb8d96f-b1e4-49fd-bbf8-28ac514178e5                        | 
Id of merger strategy. Hudi will pick HoodieRecordMerger implementations in 
hoodie.datasource.write.record.merger.impls which has the same merger strategy 
id<br />`Config Param: RECORD_MERGER_STRATEGY`<br />`Since Version: 0.13.0`     
                                                                                
               [...]
+| 
[hoodie.datasource.write.hive_style_partitioning](#hoodiedatasourcewritehive_style_partitioning)
 | false                                                       | Flag to 
indicate whether to use Hive style partitioning. If set true, the names of 
partition folders follow &lt;partition_column_name&gt;=&lt;partition_value&gt; 
format. By default false (the names of partition folders are only partition 
values)<br />`Config Param: HIVE_STYLE_PARTITIONING_ENABLE`                     
            [...]
+| 
[hoodie.partition.metafile.use.base.format](#hoodiepartitionmetafileusebaseformat)
               | false                                                       | 
If true, partition metafiles are saved in the same format as base-files for 
this dataset (e.g. Parquet / ORC). If false (default) partition metafiles are 
saved as properties files.<br />`Config Param: 
PARTITION_METAFILE_USE_BASE_FORMAT`                                             
                                                 [...]
+| [hoodie.populate.meta.fields](#hoodiepopulatemetafields)                     
                    | true                                                      
  | When enabled, populates all meta fields. When disabled, no meta fields are 
populated and incremental queries will not be functional. This is only meant to 
be used for append only/immutable data for batch processing<br />`Config Param: 
POPULATE_META_FIELDS`                                                           
               [...]
+| [hoodie.table.base.file.format](#hoodietablebasefileformat)                  
                    | PARQUET                                                   
  | Base file format to store all the base file data.<br />`Config Param: 
BASE_FILE_FORMAT`                                                               
                                                                                
                                                                                
                    [...]
+| [hoodie.table.cdc.enabled](#hoodietablecdcenabled)                           
                    | false                                                     
  | When enable, persist the change data if necessary, and can be queried as a 
CDC query mode.<br />`Config Param: CDC_ENABLED`<br />`Since Version: 0.13.0`   
                                                                                
                                                                                
               [...]
+| 
[hoodie.table.cdc.supplemental.logging.mode](#hoodietablecdcsupplementalloggingmode)
             | DATA_BEFORE_AFTER                                           | 
org.apache.hudi.common.table.cdc.HoodieCDCSupplementalLoggingMode: Change log 
capture supplemental logging mode. The supplemental log is used for 
accelerating the generation of change log details.     OP_KEY_ONLY: Only 
keeping record keys in the supplemental logs, so the reader needs to figure out 
the update before image and af [...]
+| [hoodie.table.log.file.format](#hoodietablelogfileformat)                    
                    | HOODIE_LOG                                                
  | Log format used for the delta logs.<br />`Config Param: LOG_FILE_FORMAT`    
                                                                                
                                                                                
                                                                                
              [...]
+| 
[hoodie.table.multiple.base.file.formats.enable](#hoodietablemultiplebasefileformatsenable)
      | false                                                       | When set 
to true, the table can support reading and writing multiple base file 
formats.<br />`Config Param: MULTIPLE_BASE_FILE_FORMATS_ENABLE`<br />`Since 
Version: 1.0.0`                                                                 
                                                                                
               [...]
+| [hoodie.table.timeline.timezone](#hoodietabletimelinetimezone)               
                    | LOCAL                                                     
  | User can set hoodie commit timeline timezone, such as utc, local and so on. 
local is default<br />`Config Param: TIMELINE_TIMEZONE`                         
                                                                                
                                                                                
              [...]
+| [hoodie.table.type](#hoodietabletype)                                        
                    | COPY_ON_WRITE                                             
  | The table type for the underlying data, for this write. This can’t change 
between writes.<br />`Config Param: TYPE`                                       
                                                                                
                                                                                
                [...]
+| [hoodie.table.version](#hoodietableversion)                                  
                    | ZERO                                                      
  | Version of table, used for running upgrade/downgrade steps between releases 
with potentially breaking/backwards compatible changes.<br />`Config Param: 
VERSION`                                                                        
                                                                                
                  [...]
+---
+
 ## Spark Datasource Configs {#SPARK_DATASOURCE}
 These configs control the Hudi Spark Datasource, providing ability to define 
keys/partitioning, pick out the write operation, specify how to merge records 
or choosing query type to read.
 
@@ -270,6 +321,29 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 
+### Error table Configs {#Error-table-Configs}
+Configurations that are required for Error table configs
+
+
+
+
+[**Basic Configs**](#Error-table-Configs-basic-configs)
+
+
+| Config Name                                                                  
                     | Default          | Description                           
                                                                                
                                                                                
                                                                                
                                       |
+| 
-------------------------------------------------------------------------------------------------
 | ---------------- | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| [hoodie.errortable.base.path](#hoodieerrortablebasepath)                     
                     | (N/A)            | Base path for error table under which 
all error records would be stored.<br />`Config Param: ERROR_TABLE_BASE_PATH`   
                                                                                
                                                                                
                                       |
+| [hoodie.errortable.target.table.name](#hoodieerrortabletargettablename)      
                     | (N/A)            | Table name to be used for the error 
table<br />`Config Param: ERROR_TARGET_TABLE`                                   
                                                                                
                                                                                
                                         |
+| [hoodie.errortable.write.class](#hoodieerrortablewriteclass)                 
                     | (N/A)            | Class which handles the error table 
writes. This config is used to configure a custom implementation for Error 
Table Writer. Specify the full class name of the custom error table writer as a 
value for this config<br />`Config Param: ERROR_TABLE_WRITE_CLASS`              
                                              |
+| [hoodie.errortable.enable](#hoodieerrortableenable)                          
                     | false            | Config to enable error table. If the 
config is enabled, all the records with processing error in DeltaStreamer are 
transferred to error table.<br />`Config Param: ERROR_TABLE_ENABLED`            
                                                                                
                                          |
+| 
[hoodie.errortable.insert.shuffle.parallelism](#hoodieerrortableinsertshuffleparallelism)
         | 200              | Config to set insert shuffle parallelism. The 
config is similar to hoodie.insert.shuffle.parallelism config but applies to 
the error table.<br />`Config Param: ERROR_TABLE_INSERT_PARALLELISM_VALUE`      
                                                                                
                                  |
+| 
[hoodie.errortable.upsert.shuffle.parallelism](#hoodieerrortableupsertshuffleparallelism)
         | 200              | Config to set upsert shuffle parallelism. The 
config is similar to hoodie.upsert.shuffle.parallelism config but applies to 
the error table.<br />`Config Param: ERROR_TABLE_UPSERT_PARALLELISM_VALUE`      
                                                                                
                                  |
+| 
[hoodie.errortable.validate.recordcreation.enable](#hoodieerrortablevalidaterecordcreationenable)
 | true             | Records that fail to be created due to keygeneration 
failure or other issues will be sent to the Error Table<br />`Config Param: 
ERROR_ENABLE_VALIDATE_RECORD_CREATION`<br />`Since Version: 0.14.2`             
                                                                                
                            |
+| 
[hoodie.errortable.validate.targetschema.enable](#hoodieerrortablevalidatetargetschemaenable)
     | false            | Records with schema mismatch with Target Schema are 
sent to Error Table.<br />`Config Param: ERROR_ENABLE_VALIDATE_TARGET_SCHEMA`   
                                                                                
                                                                                
                         |
+| 
[hoodie.errortable.write.failure.strategy](#hoodieerrortablewritefailurestrategy)
                 | ROLLBACK_COMMIT  | The config specifies the failure strategy 
if error table write fails. Use one of - [ROLLBACK_COMMIT (Rollback the 
corresponding base table write commit for which the error events were 
triggered) , LOG_ERROR (Error is logged but the base table write succeeds) ]<br 
/>`Config Param: ERROR_TABLE_WRITE_FAILURE_STRATEGY` |
+---
+
+
 ### Write Configurations {#Write-Configurations}
 Configurations that control write behavior on Hudi tables. These can be 
directly passed down from even higher level frameworks (e.g Spark datasources, 
Flink sink) and utilities (e.g Hudi Streamer).
 
@@ -623,6 +697,21 @@ Configurations controlling the behavior of S3 source in 
Hudi Streamer.
 ---
 
 
+#### File-based SQL Source Configs {#File-based-SQL-Source-Configs}
+Configurations controlling the behavior of File-based SQL Source in Hudi 
Streamer.
+
+
+
+
+[**Basic Configs**](#File-based-SQL-Source-Configs-basic-configs)
+
+
+| Config Name                                                     | Default | 
Description                                                                     
                                              |
+| --------------------------------------------------------------- | ------- | 
-----------------------------------------------------------------------------------------------------------------------------
 |
+| [hoodie.streamer.source.sql.file](#hoodiestreamersourcesqlfile) | (N/A)   | 
SQL file path containing the SQL query to read source data.<br />`Config Param: 
SOURCE_SQL_FILE`<br />`Since Version: 0.14.0` |
+---
+
+
 #### SQL Source Configs {#SQL-Source-Configs}
 Configurations controlling the behavior of SQL source in Hudi Streamer.
 
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 1271fb9822c..728a1e61409 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -5,12 +5,13 @@ permalink: /docs/configurations.html
 summary: This page covers the different ways of configuring your job to 
write/read Hudi tables. At a high level, you can control behaviour at few 
levels.
 toc_min_heading_level: 2
 toc_max_heading_level: 4
-last_modified_at: 2024-04-15T09:56:05.395
+last_modified_at: 2024-04-19T18:21:42.86
 ---
 
 
 This page covers the different ways of configuring your job to write/read Hudi 
tables. At a high level, you can control behaviour at few levels.
 
+- [**Hudi Table Config**](#TABLE_CONFIG): Basic Hudi Table configuration 
parameters.
 - [**Environment Config**](#ENVIRONMENT_CONFIG): Hudi supports passing 
configurations via a configuration file `hudi-default.conf` in which each line 
consists of a key and a value separated by whitespace or = sign. For example:
 ```
 hoodie.datasource.hive_sync.mode               jdbc
@@ -42,6 +43,63 @@ file `hudi-default.conf`. By default, Hudi would load the 
configuration file und
 specify a different configuration directory location by setting the 
`HUDI_CONF_DIR` environment variable. This can be
 useful for uniformly enforcing repeated configs (like Hive sync or write/index 
tuning), across your entire data lake.
 
+## Hudi Table Config {#TABLE_CONFIG}
+Basic Hudi Table configuration parameters.
+
+
+### Hudi Table Basic Configs {#Hudi-Table-Basic-Configs}
+Configurations of the Hudi Table like type of ingestion, storage formats, hive 
table name etc. Configurations are loaded from hoodie.properties, these 
properties are usually set during initializing a path as hoodie base path and 
never changes during the lifetime of a hoodie table.
+
+
+
+[**Basic Configs**](#Hudi-Table-Basic-Configs-basic-configs)
+
+
+| Config Name                                                                  
                    | Default                                                   
  | Description                                                                 
                                                                                
                                                                                
                                                                                
              [...]
+| 
------------------------------------------------------------------------------------------------
 | ----------------------------------------------------------- | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| [hoodie.bootstrap.base.path](#hoodiebootstrapbasepath)                       
                    | (N/A)                                                     
  | Base path of the dataset that needs to be bootstrapped as a Hudi table<br 
/>`Config Param: BOOTSTRAP_BASE_PATH`                                           
                                                                                
                                                                                
                [...]
+| [hoodie.database.name](#hoodiedatabasename)                                  
                    | (N/A)                                                     
  | Database name that will be used for incremental query.If different 
databases have the same table name during incremental query, we can set it to 
limit the table name under a specific database<br />`Config Param: 
DATABASE_NAME`                                                                  
                                      [...]
+| [hoodie.table.checksum](#hoodietablechecksum)                                
                    | (N/A)                                                     
  | Table checksum is used to guard against partial writes in HDFS. It is added 
as the last entry in hoodie.properties and then used to validate while reading 
table config.<br />`Config Param: TABLE_CHECKSUM`<br />`Since Version: 0.11.0`  
                                                                                
               [...]
+| [hoodie.table.create.schema](#hoodietablecreateschema)                       
                    | (N/A)                                                     
  | Schema used when creating the table, for the first time.<br />`Config 
Param: CREATE_SCHEMA`                                                           
                                                                                
                                                                                
                    [...]
+| [hoodie.table.index.defs.path](#hoodietableindexdefspath)                    
                    | (N/A)                                                     
  | Absolute path where the index definitions are stored<br />`Config Param: 
INDEX_DEFINITION_PATH`<br />`Since Version: 1.0.0`                              
                                                                                
                                                                                
                 [...]
+| [hoodie.table.keygenerator.class](#hoodietablekeygeneratorclass)             
                    | (N/A)                                                     
  | Key Generator class property for the hoodie table<br />`Config Param: 
KEY_GENERATOR_CLASS_NAME`                                                       
                                                                                
                                                                                
                    [...]
+| [hoodie.table.keygenerator.type](#hoodietablekeygeneratortype)               
                    | (N/A)                                                     
  | Key Generator type to determine key generator class<br />`Config Param: 
KEY_GENERATOR_TYPE`<br />`Since Version: 1.0.0`                                 
                                                                                
                                                                                
                  [...]
+| [hoodie.table.metadata.partitions](#hoodietablemetadatapartitions)           
                    | (N/A)                                                     
  | Comma-separated list of metadata partitions that have been completely built 
and in-sync with data table. These partitions are ready for use by the 
readers<br />`Config Param: TABLE_METADATA_PARTITIONS`<br />`Since Version: 
0.11.0`                                                                         
                           [...]
+| 
[hoodie.table.metadata.partitions.inflight](#hoodietablemetadatapartitionsinflight)
              | (N/A)                                                       | 
Comma-separated list of metadata partitions whose building is in progress. 
These partitions are not yet ready for use by the readers.<br />`Config Param: 
TABLE_METADATA_PARTITIONS_INFLIGHT`<br />`Since Version: 0.11.0`                
                                                                                
                [...]
+| [hoodie.table.name](#hoodietablename)                                        
                    | (N/A)                                                     
  | Table name that will be used for registering with Hive. Needs to be same 
across runs.<br />`Config Param: NAME`                                          
                                                                                
                                                                                
                 [...]
+| [hoodie.table.partition.fields](#hoodietablepartitionfields)                 
                    | (N/A)                                                     
  | Fields used to partition the table. Concatenated values of these fields are 
used as the partition path, by invoking toString()<br />`Config Param: 
PARTITION_FIELDS`                                                               
                                                                                
                       [...]
+| [hoodie.table.precombine.field](#hoodietableprecombinefield)                 
                    | (N/A)                                                     
  | Field used in preCombining before actual write. By default, when two 
records have the same key value, the largest value for the precombine field 
determined by Object.compareTo(..), is picked.<br />`Config Param: 
PRECOMBINE_FIELD`                                                               
                                      [...]
+| [hoodie.table.recordkey.fields](#hoodietablerecordkeyfields)                 
                    | (N/A)                                                     
  | Columns used to uniquely identify the table. Concatenated values of these 
fields are used as  the record key component of HoodieKey.<br />`Config Param: 
RECORDKEY_FIELDS`                                                               
                                                                                
                 [...]
+| 
[hoodie.table.secondary.indexes.metadata](#hoodietablesecondaryindexesmetadata) 
                 | (N/A)                                                       
| The metadata of secondary indexes<br />`Config Param: 
SECONDARY_INDEXES_METADATA`<br />`Since Version: 0.13.0`                        
                                                                                
                                                                                
                                    [...]
+| [hoodie.timeline.layout.version](#hoodietimelinelayoutversion)               
                    | (N/A)                                                     
  | Version of timeline used, by the table.<br />`Config Param: 
TIMELINE_LAYOUT_VERSION`                                                        
                                                                                
                                                                                
                              [...]
+| [hoodie.archivelog.folder](#hoodiearchivelogfolder)                          
                    | archived                                                  
  | path under the meta folder, to store archived timeline instants at.<br 
/>`Config Param: ARCHIVELOG_FOLDER`                                             
                                                                                
                                                                                
                   [...]
+| [hoodie.bootstrap.index.class](#hoodiebootstrapindexclass)                   
                    | 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex  | Implementation to 
use, for mapping base files to bootstrap base file, that contain actual 
data.<br />`Config Param: BOOTSTRAP_INDEX_CLASS_NAME`                           
                                                                                
                                                                                
[...]
+| [hoodie.bootstrap.index.enable](#hoodiebootstrapindexenable)                 
                    | true                                                      
  | Whether or not, this is a bootstrapped table, with bootstrap base data and 
an mapping index defined, default true.<br />`Config Param: 
BOOTSTRAP_INDEX_ENABLE`                                                         
                                                                                
                                   [...]
+| [hoodie.bootstrap.index.type](#hoodiebootstrapindextype)                     
                    | HFILE                                                     
  | Bootstrap index type determines which implementation to use, for mapping 
base files to bootstrap base file, that contain actual data.<br />`Config 
Param: BOOTSTRAP_INDEX_TYPE`<br />`Since Version: 1.0.0`                        
                                                                                
                       [...]
+| [hoodie.compaction.payload.class](#hoodiecompactionpayloadclass)             
                    | org.apache.hudi.common.model.DefaultHoodieRecordPayload   
  | Payload class to use for performing compactions, i.e merge delta logs with 
current base file and then  produce a new base file.<br />`Config Param: 
PAYLOAD_CLASS_NAME`                                                             
                                                                                
                      [...]
+| [hoodie.compaction.payload.type](#hoodiecompactionpayloadtype)               
                    | HOODIE_AVRO_DEFAULT                                       
  | org.apache.hudi.common.model.RecordPayloadType: Payload to use for merging 
records     AWS_DMS_AVRO: Provides support for seamlessly applying changes 
captured via Amazon Database Migration Service onto S3.     HOODIE_AVRO: A 
payload to wrap a existing Hoodie Avro Record. Useful to create a HoodieRecord 
over existing GenericReco [...]
+| 
[hoodie.compaction.record.merger.strategy](#hoodiecompactionrecordmergerstrategy)
                | eeb8d96f-b1e4-49fd-bbf8-28ac514178e5                        | 
Id of merger strategy. Hudi will pick HoodieRecordMerger implementations in 
hoodie.datasource.write.record.merger.impls which has the same merger strategy 
id<br />`Config Param: RECORD_MERGER_STRATEGY`<br />`Since Version: 0.13.0`     
                                                                                
               [...]
+| 
[hoodie.datasource.write.hive_style_partitioning](#hoodiedatasourcewritehive_style_partitioning)
 | false                                                       | Flag to 
indicate whether to use Hive style partitioning. If set true, the names of 
partition folders follow &lt;partition_column_name&gt;=&lt;partition_value&gt; 
format. By default false (the names of partition folders are only partition 
values)<br />`Config Param: HIVE_STYLE_PARTITIONING_ENABLE`                     
            [...]
+| 
[hoodie.partition.metafile.use.base.format](#hoodiepartitionmetafileusebaseformat)
               | false                                                       | 
If true, partition metafiles are saved in the same format as base-files for 
this dataset (e.g. Parquet / ORC). If false (default) partition metafiles are 
saved as properties files.<br />`Config Param: 
PARTITION_METAFILE_USE_BASE_FORMAT`                                             
                                                 [...]
+| [hoodie.populate.meta.fields](#hoodiepopulatemetafields)                     
                    | true                                                      
  | When enabled, populates all meta fields. When disabled, no meta fields are 
populated and incremental queries will not be functional. This is only meant to 
be used for append only/immutable data for batch processing<br />`Config Param: 
POPULATE_META_FIELDS`                                                           
               [...]
+| [hoodie.table.base.file.format](#hoodietablebasefileformat)                  
                    | PARQUET                                                   
  | Base file format to store all the base file data.<br />`Config Param: 
BASE_FILE_FORMAT`                                                               
                                                                                
                                                                                
                    [...]
+| [hoodie.table.cdc.enabled](#hoodietablecdcenabled)                           
                    | false                                                     
  | When enable, persist the change data if necessary, and can be queried as a 
CDC query mode.<br />`Config Param: CDC_ENABLED`<br />`Since Version: 0.13.0`   
                                                                                
                                                                                
               [...]
+| 
[hoodie.table.cdc.supplemental.logging.mode](#hoodietablecdcsupplementalloggingmode)
             | DATA_BEFORE_AFTER                                           | 
org.apache.hudi.common.table.cdc.HoodieCDCSupplementalLoggingMode: Change log 
capture supplemental logging mode. The supplemental log is used for 
accelerating the generation of change log details.     OP_KEY_ONLY: Only 
keeping record keys in the supplemental logs, so the reader needs to figure out 
the update before image and af [...]
+| [hoodie.table.log.file.format](#hoodietablelogfileformat)                    
                    | HOODIE_LOG                                                
  | Log format used for the delta logs.<br />`Config Param: LOG_FILE_FORMAT`    
                                                                                
                                                                                
                                                                                
              [...]
+| 
[hoodie.table.multiple.base.file.formats.enable](#hoodietablemultiplebasefileformatsenable)
      | false                                                       | When set 
to true, the table can support reading and writing multiple base file 
formats.<br />`Config Param: MULTIPLE_BASE_FILE_FORMATS_ENABLE`<br />`Since 
Version: 1.0.0`                                                                 
                                                                                
               [...]
+| [hoodie.table.timeline.timezone](#hoodietabletimelinetimezone)               
                    | LOCAL                                                     
  | User can set hoodie commit timeline timezone, such as utc, local and so on. 
local is default<br />`Config Param: TIMELINE_TIMEZONE`                         
                                                                                
                                                                                
              [...]
+| [hoodie.table.type](#hoodietabletype)                                        
                    | COPY_ON_WRITE                                             
  | The table type for the underlying data, for this write. This can’t change 
between writes.<br />`Config Param: TYPE`                                       
                                                                                
                                                                                
                [...]
+| [hoodie.table.version](#hoodietableversion)                                  
                    | ZERO                                                      
  | Version of table, used for running upgrade/downgrade steps between releases 
with potentially breaking/backwards compatible changes.<br />`Config Param: 
VERSION`                                                                        
                                                                                
                  [...]
+
+[**Advanced Configs**](#Hudi-Table-Basic-Configs-advanced-configs)
+
+
+| Config Name                                                                  
                   | Default | Description                                      
                                                                                
 |
+| 
-----------------------------------------------------------------------------------------------
 | ------- | 
---------------------------------------------------------------------------------------------------------------------------------
 |
+| 
[hoodie.datasource.write.drop.partition.columns](#hoodiedatasourcewritedroppartitioncolumns)
    | false   | When set to true, will not write the partition columns into 
hudi. By default, false.<br />`Config Param: DROP_PARTITION_COLUMNS`  |
+| 
[hoodie.datasource.write.partitionpath.urlencode](#hoodiedatasourcewritepartitionpathurlencode)
 | false   | Should we url encode the partition path value, before creating the 
folder structure.<br />`Config Param: URL_ENCODE_PARTITIONING` |
+---
+
 ## Spark Datasource Configs {#SPARK_DATASOURCE}
 These configs control the Hudi Spark Datasource, providing ability to define 
keys/partitioning, pick out the write operation, specify how to merge records 
or choosing query type to read.
 
@@ -764,6 +822,28 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 
+### Error table Configs {#Error-table-Configs}
+Configurations that are required for Error table configs
+
+
+
+[**Basic Configs**](#Error-table-Configs-basic-configs)
+
+
+| Config Name                                                                  
                     | Default          | Description                           
                                                                                
                                                                                
                                                                                
                                       |
+| 
-------------------------------------------------------------------------------------------------
 | ---------------- | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| [hoodie.errortable.base.path](#hoodieerrortablebasepath)                     
                     | (N/A)            | Base path for error table under which 
all error records would be stored.<br />`Config Param: ERROR_TABLE_BASE_PATH`   
                                                                                
                                                                                
                                       |
+| [hoodie.errortable.target.table.name](#hoodieerrortabletargettablename)      
                     | (N/A)            | Table name to be used for the error 
table<br />`Config Param: ERROR_TARGET_TABLE`                                   
                                                                                
                                                                                
                                         |
+| [hoodie.errortable.write.class](#hoodieerrortablewriteclass)                 
                     | (N/A)            | Class which handles the error table 
writes. This config is used to configure a custom implementation for Error 
Table Writer. Specify the full class name of the custom error table writer as a 
value for this config<br />`Config Param: ERROR_TABLE_WRITE_CLASS`              
                                              |
+| [hoodie.errortable.enable](#hoodieerrortableenable)                          
                     | false            | Config to enable error table. If the 
config is enabled, all the records with processing error in DeltaStreamer are 
transferred to error table.<br />`Config Param: ERROR_TABLE_ENABLED`            
                                                                                
                                          |
+| 
[hoodie.errortable.insert.shuffle.parallelism](#hoodieerrortableinsertshuffleparallelism)
         | 200              | Config to set insert shuffle parallelism. The 
config is similar to hoodie.insert.shuffle.parallelism config but applies to 
the error table.<br />`Config Param: ERROR_TABLE_INSERT_PARALLELISM_VALUE`      
                                                                                
                                  |
+| 
[hoodie.errortable.upsert.shuffle.parallelism](#hoodieerrortableupsertshuffleparallelism)
         | 200              | Config to set upsert shuffle parallelism. The 
config is similar to hoodie.upsert.shuffle.parallelism config but applies to 
the error table.<br />`Config Param: ERROR_TABLE_UPSERT_PARALLELISM_VALUE`      
                                                                                
                                  |
+| 
[hoodie.errortable.validate.recordcreation.enable](#hoodieerrortablevalidaterecordcreationenable)
 | true             | Records that fail to be created due to keygeneration 
failure or other issues will be sent to the Error Table<br />`Config Param: 
ERROR_ENABLE_VALIDATE_RECORD_CREATION`<br />`Since Version: 0.14.2`             
                                                                                
                            |
+| 
[hoodie.errortable.validate.targetschema.enable](#hoodieerrortablevalidatetargetschemaenable)
     | false            | Records with schema mismatch with Target Schema are 
sent to Error Table.<br />`Config Param: ERROR_ENABLE_VALIDATE_TARGET_SCHEMA`   
                                                                                
                                                                                
                         |
+| 
[hoodie.errortable.write.failure.strategy](#hoodieerrortablewritefailurestrategy)
                 | ROLLBACK_COMMIT  | The config specifies the failure strategy 
if error table write fails. Use one of - [ROLLBACK_COMMIT (Rollback the 
corresponding base table write commit for which the error events were 
triggered) , LOG_ERROR (Error is logged but the base table write succeeds) ]<br 
/>`Config Param: ERROR_TABLE_WRITE_FAILURE_STRATEGY` |
+---
+
+
 ### Layout Configs {#Layout-Configs}
 Configurations that control storage layout and data distribution, which 
defines how the files are organized within a table.
 
@@ -1059,6 +1139,28 @@ Hudi maintains keys (record key + partition path) for 
uniquely identifying a par
 ---
 
 
+#### Timestamp-based key generator configs 
{#Timestamp-based-key-generator-configs}
+Configs used for TimestampBasedKeyGenerator which relies on timestamps for the 
partition field. The field values are interpreted as timestamps and not just 
converted to string while generating partition path value for records. Record 
key is same as before where it is chosen by field name.
+
+
+
+[**Advanced Configs**](#Timestamp-based-key-generator-configs-advanced-configs)
+
+
+| Config Name                                                                  
                                            | Default                           
                  | Description                                                 
                                                                                
                                                                                
                                                                |
+| 
------------------------------------------------------------------------------------------------------------------------
 | --------------------------------------------------- | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| 
[hoodie.keygen.timebased.timestamp.type](#hoodiekeygentimebasedtimestamptype)   
                                         | (N/A)                                
               | Timestamp type of the field, which should be one of the 
timestamp types supported: `UNIX_TIMESTAMP`, `DATE_STRING`, `MIXED`, 
`EPOCHMILLISECONDS`, `SCALAR`.<br />`Config Param: TIMESTAMP_TYPE_FIELD`        
                                                                               |
+| [hoodie.keygen.datetime.parser.class](#hoodiekeygendatetimeparserclass)      
                                            | 
org.apache.hudi.keygen.parser.HoodieDateTimeParser  | Date time parser class 
name.<br />`Config Param: DATE_TIME_PARSER`                                     
                                                                                
                                                                                
                     |
+| 
[hoodie.keygen.timebased.input.dateformat](#hoodiekeygentimebasedinputdateformat)
                                        |                                       
              | Input date format such as `yyyy-MM-dd'T'HH:mm:ss.SSSZ`.<br 
/>`Config Param: TIMESTAMP_INPUT_DATE_FORMAT`                                   
                                                                                
                                                                 |
+| 
[hoodie.keygen.timebased.input.dateformat.list.delimiter.regex](#hoodiekeygentimebasedinputdateformatlistdelimiterregex)
 | ,                                                   | The delimiter for 
allowed input date format list, usually `,`.<br />`Config Param: 
TIMESTAMP_INPUT_DATE_FORMAT_LIST_DELIMITER_REGEX`                               
                                                                                
                                         |
+| 
[hoodie.keygen.timebased.input.timezone](#hoodiekeygentimebasedinputtimezone)   
                                         | UTC                                  
               | Timezone of the input timestamp, such as `UTC`.<br />`Config 
Param: TIMESTAMP_INPUT_TIMEZONE_FORMAT`                                         
                                                                                
                                                               |
+| 
[hoodie.keygen.timebased.output.dateformat](#hoodiekeygentimebasedoutputdateformat)
                                      |                                         
            | Output date format such as `yyyy-MM-dd'T'HH:mm:ss.SSSZ`.<br 
/>`Config Param: TIMESTAMP_OUTPUT_DATE_FORMAT`                                  
                                                                                
                                                                |
+| 
[hoodie.keygen.timebased.output.timezone](#hoodiekeygentimebasedoutputtimezone) 
                                         | UTC                                  
               | Timezone of the output timestamp, such as `UTC`.<br />`Config 
Param: TIMESTAMP_OUTPUT_TIMEZONE_FORMAT`                                        
                                                                                
                                                              |
+| 
[hoodie.keygen.timebased.timestamp.scalar.time.unit](#hoodiekeygentimebasedtimestampscalartimeunit)
                      | SECONDS                                             | 
When timestamp type `SCALAR` is used, this specifies the time unit, with 
allowed unit specified by `TimeUnit` enums (`NANOSECONDS`, `MICROSECONDS`, 
`MILLISECONDS`, `SECONDS`, `MINUTES`, `HOURS`, `DAYS`).<br />`Config Param: 
INPUT_TIME_UNIT`                                            |
+| [hoodie.keygen.timebased.timezone](#hoodiekeygentimebasedtimezone)           
                                            | UTC                               
                  | Timezone of both input and output timestamp if they are the 
same, such as `UTC`.  Please use `hoodie.keygen.timebased.input.timezone` and 
`hoodie.keygen.timebased.output.timezone` instead if the input and output 
timezones are different.<br />`Config Param: TIMESTAMP_TIMEZONE_FORMAT` |
+---
+
+
 ### Index Configs {#INDEX}
 Configurations that control indexing behavior, which tags incoming records as 
either inserts or updates to older records.
 
@@ -1977,6 +2079,27 @@ Configurations controlling the behavior of S3 source in 
Hudi Streamer.
 ---
 
 
+#### File-based SQL Source Configs {#File-based-SQL-Source-Configs}
+Configurations controlling the behavior of File-based SQL Source in Hudi 
Streamer.
+
+
+
+[**Basic Configs**](#File-based-SQL-Source-Configs-basic-configs)
+
+
+| Config Name                                                     | Default | 
Description                                                                     
                                              |
+| --------------------------------------------------------------- | ------- | 
-----------------------------------------------------------------------------------------------------------------------------
 |
+| [hoodie.streamer.source.sql.file](#hoodiestreamersourcesqlfile) | (N/A)   | 
SQL file path containing the SQL query to read source data.<br />`Config Param: 
SOURCE_SQL_FILE`<br />`Since Version: 0.14.0` |
+
+[**Advanced Configs**](#File-based-SQL-Source-Configs-advanced-configs)
+
+
+| Config Name                                                                  
        | Default | Description                                                 
                                                                          |
+| 
------------------------------------------------------------------------------------
 | ------- | 
-------------------------------------------------------------------------------------------------------------------------------------
 |
+| 
[hoodie.streamer.source.sql.checkpoint.emit](#hoodiestreamersourcesqlcheckpointemit)
 | false   | Whether to emit the current epoch as the streamer checkpoint.<br 
/>`Config Param: EMIT_EPOCH_CHECKPOINT`<br />`Since Version: 0.14.0` |
+---
+
+
 #### SQL Source Configs {#SQL-Source-Configs}
 Configurations controlling the behavior of SQL source in Hudi Streamer.

(hudi) branch asf-site updated: [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058)

Reply via email to