xushiyan commented on code in PR #14322: URL: https://github.com/apache/hudi/pull/14322#discussion_r2553840367
##########
website/docs/write_operations.md:
##########
@@ -93,27 +93,27 @@ Here are the basic configs relevant to the write operations
types mentioned abov
**Spark based configs:**
-| Config Name | Default |
Description
|
-|------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| hoodie.datasource.write.operation | upsert (Optional) |
Whether to do upsert, insert or bulk_insert for the write operation. Use
bulk_insert to load new data into a table, and there on use upsert/insert. bulk
insert uses a disk based write path to scale to load large inputs without need
to cache it.<br /><br />`Config Param: OPERATION`
|
-| hoodie.datasource.write.precombine.field | ts (Optional) |
Field used in preCombining before actual write. When two records have the same
key value, we will pick the one with the largest value for the precombine
field, determined by Object.compareTo(..)<br /><br />`Config Param:
PRECOMBINE_FIELD`
|
-| hoodie.combine.before.insert | false (Optional) | When
inserted records share same key, controls whether they should be first combined
(i.e de-duplicated) before writing to storage.<br /><br />`Config Param:
COMBINE_BEFORE_INSERT`
|
-| hoodie.datasource.write.insert.drop.duplicates | false (Optional) | If
set to true, records from the incoming dataframe will not overwrite existing
records with the same key during the write operation. This config is deprecated
as of 0.14.0. Please use hoodie.datasource.insert.dup.policy instead.<br /><br
/>`Config Param: INSERT_DROP_DUPS`
|
-| hoodie.bulkinsert.sort.mode | NONE (Optional) |
org.apache.hudi.execution.bulkinsert.BulkInsertSortMode: Modes for sorting
records during bulk insert. <ul><li>`NONE(default)`: No sorting. Fastest and
matches `spark.write.parquet()` in number of files and
overhead.</li><li>`GLOBAL_SORT`: This ensures best file sizes, with lowest
memory overhead at cost of sorting.</li><li>`PARTITION_SORT`: Strikes a balance
by only sorting within a Spark RDD partition, still keeping the memory overhead
of writing low. File sizing is not as good as
`GLOBAL_SORT`.</li><li>`PARTITION_PATH_REPARTITION`: This ensures that the data
for a single physical partition in the table is written by the same Spark
executor. This should only be used when input data is evenly distributed across
different partition paths. If data is skewed (most records are intended for a
handful of partition paths among all) then this can cause an imbalance among
Spark executors.</li><li>`PARTITION_PATH_REPAR
TITION_AND_SORT`: This ensures that the data for a single physical partition
in the table is written by the same Spark executor. This should only be used
when input data is evenly distributed across different partition paths.
Compared to `PARTITION_PATH_REPARTITION`, this sort mode does an additional
step of sorting the records based on the partition path within a single Spark
partition, given that data for multiple physical partitions can be sent to the
same Spark partition and executor. If data is skewed (most records are intended
for a handful of partition paths among all) then this can cause an imbalance
among Spark executors.</li></ul><br />`Config Param: BULK_INSERT_SORT_MODE` |
-| hoodie.bootstrap.base.path | N/A **(Required)** |
**Applicable only when** operation type is `bootstrap`. Base path of the
dataset that needs to be bootstrapped as a Hudi table<br /><br />`Config Param:
BASE_PATH`<br />`Since Version: 0.6.0`
|
-| hoodie.bootstrap.mode.selector |
org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector
(Optional) | Selects the mode in which each file/partition in the
bootstrapped dataset gets bootstrapped<br />Possible
values:<ul><li>`org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector`:
In this mode, the full record data is not copied into Hudi therefore it avoids
full cost of rewriting the dataset. Instead, 'skeleton' files containing just
the corresponding metadata columns are added to the Hudi table. Hudi relies on
the data in the original table and will face data-loss or corruption if files
in the original table location are deleted or
modified.</li><li>`org.apache.hudi.client.bootstrap.selector.FullRecordBootstrapModeSelector`:
In this mode, the full record data is copied into hudi and metadata columns
are added. A full record bootstrap is functionally equivalent to a bulk-insert.
After a full record bootstrap, Hudi w
ill function properly even if the original table is modified or
deleted.</li><li>`org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector`:
A bootstrap selector which employs bootstrap mode by specified
partitions.</li></ul><br />`Config Param: MODE_SELECTOR_CLASS_NAME`<br />`Since
Version: 0.6.0`
|
-| hoodie.datasource.write.partitions.to.delete | N/A **(Required)** |
**Applicable only when** operation type is `delete_partition`. Comma separated
list of partitions to delete. Allows use of wildcard *<br /><br />`Config
Param: PARTITIONS_TO_DELETE`
|
+| Config Name | Default
| Description
|
+|------------------------------------------------|----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.datasource.write.operation | upsert (Optional)
| Whether to do
upsert, insert or bulk_insert for the write operation. Use bulk_insert to load
new data into a table, and there on use upsert/insert. bulk insert uses a disk
based write path to scale to load large inputs without need to cache it.<br
/><br />`Config Param: OPERATION`
|
+| hoodie.datasource.write.precombine.field | (no default) (Optional)
| Field used for
ordering records before actual write. When two records have the same key value,
we will pick the one with the largest value for the ordering field, determined
by Object.compareTo(..). Note: This config is deprecated, use
`hoodie.table.ordering.fields` instead.<br /><br />`Config Param:
PRECOMBINE_FIELD`
|
Review Comment:
this diff is just formatting, and this line removing `ts` default value
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
