This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7de5edb [HUDI-2539] Update the config keys of 0.8.0 version in the
docs to 0.9.0 (#3775)
7de5edb is described below
commit 7de5edb8656859d8fa19f0988fe686393f9b0570
Author: 董可伦 <[email protected]>
AuthorDate: Sat Oct 23 05:03:16 2021 +0800
[HUDI-2539] Update the config keys of 0.8.0 version in the docs to 0.9.0
(#3775)
---
.../version-0.9.0/concurrency_control.md | 16 +++----
.../versioned_docs/version-0.9.0/configurations.md | 10 ++--
website/versioned_docs/version-0.9.0/deployment.md | 8 ++--
.../versioned_docs/version-0.9.0/docker_demo.md | 2 +-
.../versioned_docs/version-0.9.0/querying_data.md | 12 ++---
.../version-0.9.0/quick-start-guide.md | 56 +++++++++++-----------
.../version-0.9.0/schema_evolution.md | 16 +++----
.../versioned_docs/version-0.9.0/writing_data.md | 34 ++++++-------
8 files changed, 77 insertions(+), 77 deletions(-)
diff --git a/website/versioned_docs/version-0.9.0/concurrency_control.md
b/website/versioned_docs/version-0.9.0/concurrency_control.md
index 96ff6eb..3a816b6 100644
--- a/website/versioned_docs/version-0.9.0/concurrency_control.md
+++ b/website/versioned_docs/version-0.9.0/concurrency_control.md
@@ -24,8 +24,8 @@ It may be helpful to understand the different guarantees
provided by [write oper
## Single Writer Guarantees
- *UPSERT Guarantee*: The target table will NEVER show duplicates.
- - *INSERT Guarantee*: The target table wilL NEVER have duplicates if
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
- - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
+ - *INSERT Guarantee*: The target table wilL NEVER have duplicates if
[dedup](/docs/configurations#INSERT_DROP_DUPS) is enabled.
+ - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if
[dedup](/docs/configurations#INSERT_DROP_DUPS) is enabled.
- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER
out of order.
## Multi Writer Guarantees
@@ -33,8 +33,8 @@ It may be helpful to understand the different guarantees
provided by [write oper
With multiple writers using OCC, some of the above guarantees change as follows
- *UPSERT Guarantee*: The target table will NEVER show duplicates.
-- *INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
-- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
+- *INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#INSERT_DROP_DUPS) is enabled.
+- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#INSERT_DROP_DUPS) is enabled.
- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints MIGHT be out
of order due to multiple writer jobs finishing at different times.
## Enabling Multi Writing
@@ -78,16 +78,16 @@ Following is an example of how to use
optimistic_concurrency_control via spark d
```java
inputDF.write.format("hudi")
.options(getQuickstartWriteConfigs)
- .option(PRECOMBINE_FIELD_OPT_KEY, "ts")
+ .option(PRECOMBINE_FIELD.key(), "ts")
.option("hoodie.cleaner.policy.failed.writes", "LAZY")
.option("hoodie.write.concurrency.mode",
"optimistic_concurrency_control")
.option("hoodie.write.lock.zookeeper.url", "zookeeper")
.option("hoodie.write.lock.zookeeper.port", "2181")
.option("hoodie.write.lock.zookeeper.lock_key", "test_table")
.option("hoodie.write.lock.zookeeper.base_path", "/test")
- .option(RECORDKEY_FIELD_OPT_KEY, "uuid")
- .option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath")
- .option(TABLE_NAME, tableName)
+ .option(RECORDKEY_FIELD.key(), "uuid")
+ .option(PARTITIONPATH_FIELD.key(), "partitionpath")
+ .option(TBL_NAME.key(), tableName)
.mode(Overwrite)
.save(basePath)
```
diff --git a/website/versioned_docs/version-0.9.0/configurations.md
b/website/versioned_docs/version-0.9.0/configurations.md
index 97b51a1..a5da1cb 100644
--- a/website/versioned_docs/version-0.9.0/configurations.md
+++ b/website/versioned_docs/version-0.9.0/configurations.md
@@ -109,10 +109,10 @@ You can pass down any of the WriteClient level configs
directly using `options()
inputDF.write()
.format("org.apache.hudi")
.options(clientOpts) // any of the Hudi client opts can be passed in as well
-.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
-.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
-.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
-.option(HoodieWriteConfig.TABLE_NAME, tableName)
+.option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "_row_key")
+.option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), "partition")
+.option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "timestamp")
+.option(HoodieWriteConfig.TBL_NAME.key(), tableName)
.mode(SaveMode.Append)
.save(basePath);
```
@@ -843,7 +843,7 @@ By default insert will accept duplicates, to gain extra
performance<br></br>
> #### hoodie.table.name
> Table name to register to Hive metastore<br></br>
> **Default Value**: N/A (Required)<br></br>
-> `Config Param: TABLE_NAME`<br></br>
+> `Config Param: TBL_NAME`<br></br>
---
diff --git a/website/versioned_docs/version-0.9.0/deployment.md
b/website/versioned_docs/version-0.9.0/deployment.md
index 20bf723..9299bf2 100644
--- a/website/versioned_docs/version-0.9.0/deployment.md
+++ b/website/versioned_docs/version-0.9.0/deployment.md
@@ -136,10 +136,10 @@ Here is an example invocation using spark datasource
inputDF.write()
.format("org.apache.hudi")
.options(clientOpts) // any of the Hudi client opts can be passed in as
well
- .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
- .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
"partition")
- .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
- .option(HoodieWriteConfig.TABLE_NAME, tableName)
+ .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "_row_key")
+ .option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), "partition")
+ .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "timestamp")
+ .option(HoodieWriteConfig.TBL_NAME.key(), tableName)
.mode(SaveMode.Append)
.save(basePath);
```
diff --git a/website/versioned_docs/version-0.9.0/docker_demo.md
b/website/versioned_docs/version-0.9.0/docker_demo.md
index 1754d75..89a1ff5 100644
--- a/website/versioned_docs/version-0.9.0/docker_demo.md
+++ b/website/versioned_docs/version-0.9.0/docker_demo.md
@@ -895,7 +895,7 @@ scala> import org.apache.hudi.DataSourceReadOptions
import org.apache.hudi.DataSourceReadOptions
# In the below query, 20180925045257 is the first commit's timestamp
-scala> val hoodieIncViewDF =
spark.read.format("org.apache.hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY,
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL).option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY,
"20180924064621").load("/user/hive/warehouse/stock_ticks_cow")
+scala> val hoodieIncViewDF =
spark.read.format("org.apache.hudi").option(DataSourceReadOptions.QUERY_TYPE.key(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL).option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(),
"20180924064621").load("/user/hive/warehouse/stock_ticks_cow")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes#StaticLoggerBinder for further details.
diff --git a/website/versioned_docs/version-0.9.0/querying_data.md
b/website/versioned_docs/version-0.9.0/querying_data.md
index 8a4399a..179da8c 100644
--- a/website/versioned_docs/version-0.9.0/querying_data.md
+++ b/website/versioned_docs/version-0.9.0/querying_data.md
@@ -10,8 +10,8 @@ Conceptually, Hudi stores data physically once on DFS, while
providing 3 differe
Once the table is synced to the Hive metastore, it provides external Hive
tables backed by Hudi's custom inputformats. Once the proper hudi
bundle has been installed, the table can be queried by popular query engines
like Hive, Spark SQL, Spark Datasource API and PrestoDB.
-Specifically, following Hive tables are registered based off [table
name](/docs/configurations#TABLE_NAME_OPT_KEY)
-and [table type](/docs/configurations#TABLE_TYPE_OPT_KEY) configs passed
during write.
+Specifically, following Hive tables are registered based off [table
name](/docs/configurations#TABLE_NAME)
+and [table type](/docs/configurations#TABLE_TYPE) configs passed during write.
If `table name = hudi_trips` and `table type = COPY_ON_WRITE`, then we get:
- `hudi_trips` supports snapshot query and incremental query on the table
backed by `HoodieParquetInputFormat`, exposing purely columnar data.
@@ -151,7 +151,7 @@ This method can be used to retrieve the data table at the
present point in time.
val hudiIncQueryDF = spark
.read()
.format("hudi")
- .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL())
+ .option(DataSourceReadOptions.QUERY_TYPE.key(),
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL())
.load(tablePath)
```
@@ -163,9 +163,9 @@ The following snippet shows how to obtain all records
changed after `beginInstan
```java
Dataset<Row> hudiIncQueryDF = spark.read()
.format("org.apache.hudi")
- .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
- .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
- .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain
partitions
+ .option(DataSourceReadOptions.QUERY_TYPE.key(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
+ .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), <beginInstantTime>)
+ .option(DataSourceReadOptions.INCR_PATH_GLOB.key(),
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain
partitions
.load(tablePath); // For incremental query, pass in the root/base path of
table
hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")
diff --git a/website/versioned_docs/version-0.9.0/quick-start-guide.md
b/website/versioned_docs/version-0.9.0/quick-start-guide.md
index 47c562d..ea10e9e 100644
--- a/website/versioned_docs/version-0.9.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.9.0/quick-start-guide.md
@@ -353,10 +353,10 @@ val inserts =
convertToStringList(dataGen.generateInserts(10))
val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
df.write.format("hudi").
options(getQuickstartWriteConfigs).
- option(PRECOMBINE_FIELD_OPT_KEY, "ts").
- option(RECORDKEY_FIELD_OPT_KEY, "uuid").
- option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
- option(TABLE_NAME, tableName).
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(RECORDKEY_FIELD.key(), "uuid").
+ option(PARTITIONPATH_FIELD.key(), "partitionpath").
+ option(TBL_NAME.key(), tableName).
mode(Overwrite).
save(basePath)
```
@@ -584,10 +584,10 @@ val updates =
convertToStringList(dataGen.generateUpdates(10))
val df = spark.read.json(spark.sparkContext.parallelize(updates, 2))
df.write.format("hudi").
options(getQuickstartWriteConfigs).
- option(PRECOMBINE_FIELD_OPT_KEY, "ts").
- option(RECORDKEY_FIELD_OPT_KEY, "uuid").
- option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
- option(TABLE_NAME, tableName).
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(RECORDKEY_FIELD.key(), "uuid").
+ option(PARTITIONPATH_FIELD.key(), "partitionpath").
+ option(TBL_NAME.key(), tableName).
mode(Append).
save(basePath)
```
@@ -726,8 +726,8 @@ val beginTime = commits(commits.length - 2) // commit time
we are interested in
// incrementally query data
val tripsIncrementalDF = spark.read.format("hudi").
- option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
- option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
+ option(QUERY_TYPE.key(), QUERY_TYPE_INCREMENTAL_OPT_VAL).
+ option(BEGIN_INSTANTTIME.key(), beginTime).
load(basePath)
tripsIncrementalDF.createOrReplaceTempView("hudi_trips_incremental")
@@ -791,9 +791,9 @@ val endTime = commits(commits.length - 2) // commit time we
are interested in
//incrementally query data
val tripsPointInTimeDF = spark.read.format("hudi").
- option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
- option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
- option(END_INSTANTTIME_OPT_KEY, endTime).
+ option(QUERY_TYPE.key(), QUERY_TYPE_INCREMENTAL_OPT_VAL).
+ option(BEGIN_INSTANTTIME.key(), beginTime).
+ option(END_INSTANTTIME.key(), endTime).
load(basePath)
tripsPointInTimeDF.createOrReplaceTempView("hudi_trips_point_in_time")
spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from
hudi_trips_point_in_time where fare > 20.0").show()
@@ -850,11 +850,11 @@ val df =
spark.read.json(spark.sparkContext.parallelize(deletes, 2))
df.write.format("hudi").
options(getQuickstartWriteConfigs).
- option(OPERATION_OPT_KEY,"delete").
- option(PRECOMBINE_FIELD_OPT_KEY, "ts").
- option(RECORDKEY_FIELD_OPT_KEY, "uuid").
- option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
- option(TABLE_NAME, tableName).
+ option(OPERATION.key(),"delete").
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(RECORDKEY_FIELD.key(), "uuid").
+ option(PARTITIONPATH_FIELD.key(), "partitionpath").
+ option(TBL_NAME.key(), tableName).
mode(Append).
save(basePath)
@@ -960,11 +960,11 @@ val inserts =
convertToStringList(dataGen.generateInserts(10))
val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
df.write.format("hudi").
options(getQuickstartWriteConfigs).
- option(OPERATION_OPT_KEY,"insert_overwrite_table").
- option(PRECOMBINE_FIELD_OPT_KEY, "ts").
- option(RECORDKEY_FIELD_OPT_KEY, "uuid").
- option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
- option(TABLE_NAME, tableName).
+ option(OPERATION.key(),"insert_overwrite_table").
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(RECORDKEY_FIELD.key(), "uuid").
+ option(PARTITIONPATH_FIELD.key(), "partitionpath").
+ option(TBL_NAME.key(), tableName).
mode(Append).
save(basePath)
@@ -1018,11 +1018,11 @@ val df = spark.
filter("partitionpath = 'americas/united_states/san_francisco'")
df.write.format("hudi").
options(getQuickstartWriteConfigs).
- option(OPERATION_OPT_KEY,"insert_overwrite").
- option(PRECOMBINE_FIELD_OPT_KEY, "ts").
- option(RECORDKEY_FIELD_OPT_KEY, "uuid").
- option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
- option(TABLE_NAME, tableName).
+ option(OPERATION.key(),"insert_overwrite").
+ option(PRECOMBINE_FIELD.key(), "ts").
+ option(RECORDKEY_FIELD.key(), "uuid").
+ option(PARTITIONPATH_FIELD.key(), "partitionpath").
+ option(TBL_NAME.key(), tableName).
mode(Append).
save(basePath)
diff --git a/website/versioned_docs/version-0.9.0/schema_evolution.md
b/website/versioned_docs/version-0.9.0/schema_evolution.md
index bd80f0d..f999e70 100644
--- a/website/versioned_docs/version-0.9.0/schema_evolution.md
+++ b/website/versioned_docs/version-0.9.0/schema_evolution.md
@@ -87,11 +87,11 @@ scala> val data1 = Seq(Row("row_1", "part_0", 0L, "bob",
"v_0", 0),
scala> var dfFromData1 = spark.createDataFrame(data1, schema)
scala> dfFromData1.write.format("hudi").
| options(getQuickstartWriteConfigs).
- | option(PRECOMBINE_FIELD_OPT_KEY.key, "preComb").
- | option(RECORDKEY_FIELD_OPT_KEY.key, "rowId").
- | option(PARTITIONPATH_FIELD_OPT_KEY.key, "partitionId").
+ | option(PRECOMBINE_FIELD.key(), "preComb").
+ | option(RECORDKEY_FIELD.key(), "rowId").
+ | option(PARTITIONPATH_FIELD.key(), "partitionId").
| option("hoodie.index.type","SIMPLE").
- | option(TABLE_NAME.key, tableName).
+ | option(TBL_NAME.key(), tableName).
| mode(Overwrite).
| save(basePath)
@@ -147,11 +147,11 @@ scala> val data2 = Seq(Row("row_2", "part_0", 5L, "john",
"v_3", 3L, "newField_1
scala> var dfFromData2 = spark.createDataFrame(data2, newSchema)
scala> dfFromData2.write.format("hudi").
| options(getQuickstartWriteConfigs).
- | option(PRECOMBINE_FIELD_OPT_KEY.key, "preComb").
- | option(RECORDKEY_FIELD_OPT_KEY.key, "rowId").
- | option(PARTITIONPATH_FIELD_OPT_KEY.key, "partitionId").
+ | option(PRECOMBINE_FIELD.key(), "preComb").
+ | option(RECORDKEY_FIELD.key(), "rowId").
+ | option(PARTITIONPATH_FIELD.key(), "partitionId").
| option("hoodie.index.type","SIMPLE").
- | option(TABLE_NAME.key, tableName).
+ | option(TBL_NAME.key(), tableName).
| mode(Append).
| save(basePath)
diff --git a/website/versioned_docs/version-0.9.0/writing_data.md
b/website/versioned_docs/version-0.9.0/writing_data.md
index c0f1da9..3c406b2 100644
--- a/website/versioned_docs/version-0.9.0/writing_data.md
+++ b/website/versioned_docs/version-0.9.0/writing_data.md
@@ -222,28 +222,28 @@ The `hudi-spark` module offers the DataSource API to
write (and read) a Spark Da
**`DataSourceWriteOptions`**:
-**RECORDKEY_FIELD_OPT_KEY** (Required): Primary key field(s). Record keys
uniquely identify a record/row within each partition. If one wants to have a
global uniqueness, there are two options. You could either make the dataset
non-partitioned, or, you can leverage Global indexes to ensure record keys are
unique irrespective of the partition path. Record keys can either be a single
column or refer to multiple columns. `KEYGENERATOR_CLASS_OPT_KEY` property
should be set accordingly based o [...]
+**RECORDKEY_FIELD** (Required): Primary key field(s). Record keys uniquely
identify a record/row within each partition. If one wants to have a global
uniqueness, there are two options. You could either make the dataset
non-partitioned, or, you can leverage Global indexes to ensure record keys are
unique irrespective of the partition path. Record keys can either be a single
column or refer to multiple columns.
`HoodieWriteConfig.KEYGENERATOR_CLASS_NAME` property should be set accordingly
[...]
Default value: `"uuid"`<br/>
-**PARTITIONPATH_FIELD_OPT_KEY** (Required): Columns to be used for
partitioning the table. To prevent partitioning, provide empty string as value
eg: `""`. Specify partitioning/no partitioning using
`KEYGENERATOR_CLASS_OPT_KEY`. If partition path needs to be url encoded, you
can set `URL_ENCODE_PARTITIONING_OPT_KEY`. If synchronizing to hive, also
specify using `HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY.`<br/>
+**PARTITIONPATH_FIELD** (Required): Columns to be used for partitioning the
table. To prevent partitioning, provide empty string as value eg: `""`. Specify
partitioning/no partitioning using `HoodieWriteConfig.KEYGENERATOR_CLASS_NAME`.
If partition path needs to be url encoded, you can set
`URL_ENCODE_PARTITIONING`. If synchronizing to hive, also specify using
`HIVE_PARTITION_EXTRACTOR_CLASS.`<br/>
Default value: `"partitionpath"`<br/>
-**PRECOMBINE_FIELD_OPT_KEY** (Required): When two records within the same
batch have the same key value, the record with the largest value from the field
specified will be choosen. If you are using default payload of
OverwriteWithLatestAvroPayload for HoodieRecordPayload (`WRITE_PAYLOAD_CLASS`),
an incoming record will always takes precendence compared to the one in storage
ignoring this `PRECOMBINE_FIELD_OPT_KEY`. <br/>
+**PRECOMBINE_FIELD** (Required): When two records within the same batch have
the same key value, the record with the largest value from the field specified
will be choosen. If you are using default payload of
OverwriteWithLatestAvroPayload for HoodieRecordPayload (`WRITE_PAYLOAD_CLASS`),
an incoming record will always takes precendence compared to the one in storage
ignoring this `PRECOMBINE_FIELD`. <br/>
Default value: `"ts"`<br/>
-**OPERATION_OPT_KEY**: The [write operations](#write-operations) to use.<br/>
+**OPERATION**: The [write operations](#write-operations) to use.<br/>
Available values:<br/>
`UPSERT_OPERATION_OPT_VAL` (default), `BULK_INSERT_OPERATION_OPT_VAL`,
`INSERT_OPERATION_OPT_VAL`, `DELETE_OPERATION_OPT_VAL`
-**TABLE_TYPE_OPT_KEY**: The [type of table](/docs/concepts#table-types) to
write to. Note: After the initial creation of a table, this value must stay
consistent when writing to (updating) the table using the Spark
`SaveMode.Append` mode.<br/>
+**TABLE_TYPE**: The [type of table](/docs/concepts#table-types) to write to.
Note: After the initial creation of a table, this value must stay consistent
when writing to (updating) the table using the Spark `SaveMode.Append`
mode.<br/>
Available values:<br/>
[`COW_TABLE_TYPE_OPT_VAL`](/docs/concepts#copy-on-write-table) (default),
[`MOR_TABLE_TYPE_OPT_VAL`](/docs/concepts#merge-on-read-table)
-**KEYGENERATOR_CLASS_OPT_KEY**: Refer to [Key Generation](#key-generation)
section below.
+**HoodieWriteConfig.KEYGENERATOR_CLASS_NAME**: Refer to [Key
Generation](#key-generation) section below.
-**HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY**: If using hive, specify if the
table should or should not be partitioned.<br/>
+**HIVE_PARTITION_EXTRACTOR_CLASS**: If using hive, specify if the table should
or should not be partitioned.<br/>
Available values:<br/>
-`classOf[SlashEncodedDayPartitionValueExtractor].getCanonicalName` (default),
`classOf[MultiPartKeysValueExtractor].getCanonicalName`,
`classOf[TimestampBasedKeyGenerator].getCanonicalName`,
`classOf[NonPartitionedExtractor].getCanonicalName`,
`classOf[GlobalDeleteKeyGenerator].getCanonicalName` (to be used when
`OPERATION_OPT_KEY` is set to `DELETE_OPERATION_OPT_VAL`)
+`classOf[SlashEncodedDayPartitionValueExtractor].getCanonicalName` (default),
`classOf[MultiPartKeysValueExtractor].getCanonicalName`,
`classOf[TimestampBasedKeyGenerator].getCanonicalName`,
`classOf[NonPartitionedExtractor].getCanonicalName`,
`classOf[GlobalDeleteKeyGenerator].getCanonicalName` (to be used when
`OPERATION.key()` is set to `DELETE_OPERATION_OPT_VAL`)
Example:
@@ -253,10 +253,10 @@ Upsert a DataFrame, specifying the necessary field names
for `recordKey => _row_
inputDF.write()
.format("org.apache.hudi")
.options(clientOpts) //Where clientOpts is of type Map[String, String].
clientOpts can include any other options necessary.
- .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
- .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
"partition")
- .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
- .option(HoodieWriteConfig.TABLE_NAME, tableName)
+ .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "_row_key")
+ .option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), "partition")
+ .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "timestamp")
+ .option(HoodieWriteConfig.TBL_NAME.key(), tableName)
.mode(SaveMode.Append)
.save(basePath);
```
@@ -299,8 +299,8 @@ INSERT INTO hudi_table select ... from ...;
Hudi maintains hoodie keys (record key + partition path) for uniquely
identifying a particular record. Key generator class will extract these out of
incoming record. Both the tools above have configs to specify the
`hoodie.datasource.write.keygenerator.class` property. For DeltaStreamer this
would come from the property file specified in `--props` and
-DataSource writer takes this config directly using
`DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()`.
-The default value for this config is `SimpleKeyGenerator`. Note: A custom key
generator class can be written/provided here as well. Primary key columns
should be provided via `RECORDKEY_FIELD_OPT_KEY` option.<br/>
+DataSource writer takes this config directly using
`HoodieWriteConfig.KEYGENERATOR_CLASS_NAME.key()`.
+The default value for this config is `SimpleKeyGenerator`. Note: A custom key
generator class can be written/provided here as well. Primary key columns
should be provided via `RECORDKEY_FIELD` option.<br/>
Hudi currently supports different combinations of record keys and partition
paths as below -
@@ -389,9 +389,9 @@ For more info refer to [Delete support in
Hudi](https://cwiki.apache.org/conflue
- **Hard Deletes** : A stronger form of deletion is to physically remove any
trace of the record from the table. This can be achieved in 3 different ways.
- 1) Using DataSource, set `OPERATION_OPT_KEY` to `DELETE_OPERATION_OPT_VAL`.
This will remove all the records in the DataSet being submitted.
+ 1) Using DataSource, set `OPERATION.key()` to `DELETE_OPERATION_OPT_VAL`.
This will remove all the records in the DataSet being submitted.
- 2) Using DataSource, set `PAYLOAD_CLASS_OPT_KEY` to
`"org.apache.hudi.EmptyHoodieRecordPayload"`. This will remove all the records
in the DataSet being submitted.
+ 2) Using DataSource, set `HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key()`
to `"org.apache.hudi.EmptyHoodieRecordPayload"`. This will remove all the
records in the DataSet being submitted.
3) Using DataSource or DeltaStreamer, add a column named
`_hoodie_is_deleted` to DataSet. The value of this column must be set to `true`
for all the records to be deleted and either `false` or left null for any
records which are to be upserted.
@@ -401,7 +401,7 @@ Example using hard delete method 2, remove all the records
from the table that e
.write().format("org.apache.hudi")
.option(...) // Add HUDI options like record-key, partition-path and others
as needed for your setup
// specify record_key, partition_key, precombine_fieldkey & usual params
- .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY,
"org.apache.hudi.EmptyHoodieRecordPayload")
+ .option(HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key(),
"org.apache.hudi.EmptyHoodieRecordPayload")
```