Re: [PR] docs: replace the notion of precombine field with ordering field [hudi]

via GitHub Sat, 22 Nov 2025 23:20:31 -0800


xushiyan commented on code in PR #14322:
URL: https://github.com/apache/hudi/pull/14322#discussion_r2553840367



##########
website/docs/write_operations.md:
##########
@@ -93,27 +93,27 @@ Here are the basic configs relevant to the write operations 
types mentioned abov
 
 **Spark based configs:**
 
-| Config Name                                    | Default              | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-|------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| hoodie.datasource.write.operation              | upsert (Optional)    | 
Whether to do upsert, insert or bulk_insert for the write operation. Use 
bulk_insert to load new data into a table, and there on use upsert/insert. bulk 
insert uses a disk based write path to scale to load large inputs without need 
to cache it.<br /><br />`Config Param: OPERATION`                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-| hoodie.datasource.write.precombine.field       | ts (Optional)        | 
Field used in preCombining before actual write. When two records have the same 
key value, we will pick the one with the largest value for the precombine 
field, determined by Object.compareTo(..)<br /><br />`Config Param: 
PRECOMBINE_FIELD`                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-| hoodie.combine.before.insert                   | false (Optional)     | When 
inserted records share same key, controls whether they should be first combined 
(i.e de-duplicated) before writing to storage.<br /><br />`Config Param: 
COMBINE_BEFORE_INSERT`                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-| hoodie.datasource.write.insert.drop.duplicates | false (Optional)     | If 
set to true, records from the incoming dataframe will not overwrite existing 
records with the same key during the write operation. This config is deprecated 
as of 0.14.0. Please use hoodie.datasource.insert.dup.policy instead.<br /><br 
/>`Config Param: INSERT_DROP_DUPS`                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-| hoodie.bulkinsert.sort.mode                    | NONE (Optional)      | 
org.apache.hudi.execution.bulkinsert.BulkInsertSortMode: Modes for sorting 
records during bulk insert. <ul><li>`NONE(default)`: No sorting. Fastest and 
matches `spark.write.parquet()` in number of files and 
overhead.</li><li>`GLOBAL_SORT`: This ensures best file sizes, with lowest 
memory overhead at cost of sorting.</li><li>`PARTITION_SORT`: Strikes a balance 
by only sorting within a Spark RDD partition, still keeping the memory overhead 
of writing low. File sizing is not as good as 
`GLOBAL_SORT`.</li><li>`PARTITION_PATH_REPARTITION`: This ensures that the data 
for a single physical partition in the table is written by the same Spark 
executor. This should only be used when input data is evenly distributed across 
different partition paths. If data is skewed (most records are intended for a 
handful of partition paths among all) then this can cause an imbalance among 
Spark executors.</li><li>`PARTITION_PATH_REPAR
 TITION_AND_SORT`: This ensures that the data for a single physical partition 
in the table is written by the same Spark executor. This should only be used 
when input data is evenly distributed across different partition paths. 
Compared to `PARTITION_PATH_REPARTITION`, this sort mode does an additional 
step of sorting the records based on the partition path within a single Spark 
partition, given that data for multiple physical partitions can be sent to the 
same Spark partition and executor. If data is skewed (most records are intended 
for a handful of partition paths among all) then this can cause an imbalance 
among Spark executors.</li></ul><br />`Config Param: BULK_INSERT_SORT_MODE` |
-| hoodie.bootstrap.base.path                     | N/A **(Required)**   | 
**Applicable only when** operation type is `bootstrap`. Base path of the 
dataset that needs to be bootstrapped as a Hudi table<br /><br />`Config Param: 
BASE_PATH`<br />`Since Version: 0.6.0`                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
-| hoodie.bootstrap.mode.selector                 | 
org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector 
(Optional)          | Selects the mode in which each file/partition in the 
bootstrapped dataset gets bootstrapped<br />Possible 
values:<ul><li>`org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector`:
 In this mode, the full record data is not copied into Hudi therefore it avoids 
full cost of rewriting the dataset. Instead, 'skeleton' files containing just 
the corresponding metadata columns are added to the Hudi table. Hudi relies on 
the data in the original table and will face data-loss or corruption if files 
in the original table location are deleted or 
modified.</li><li>`org.apache.hudi.client.bootstrap.selector.FullRecordBootstrapModeSelector`:
 In this mode, the full record data is copied into hudi and metadata columns 
are added. A full record bootstrap is functionally equivalent to a bulk-insert. 
After a full record bootstrap, Hudi w
 ill function properly even if the original table is modified or 
deleted.</li><li>`org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector`:
 A bootstrap selector which employs bootstrap mode by specified 
partitions.</li></ul><br />`Config Param: MODE_SELECTOR_CLASS_NAME`<br />`Since 
Version: 0.6.0`                                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                       |
-| hoodie.datasource.write.partitions.to.delete   | N/A **(Required)**   | 
**Applicable only when** operation type is `delete_partition`. Comma separated 
list of partitions to delete. Allows use of wildcard *<br /><br />`Config 
Param: PARTITIONS_TO_DELETE`                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     |
+| Config Name                                    | Default                     
                                                           | Description        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       |
+|------------------------------------------------|----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.datasource.write.operation              | upsert (Optional)           
                                                           | Whether to do 
upsert, insert or bulk_insert for the write operation. Use bulk_insert to load 
new data into a table, and there on use upsert/insert. bulk insert uses a disk 
based write path to scale to load large inputs without need to cache it.<br 
/><br />`Config Param: OPERATION`                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       |
+| hoodie.datasource.write.precombine.field       | (no default) (Optional)     
                                                           | Field used for 
ordering records before actual write. When two records have the same key value, 
we will pick the one with the largest value for the ordering field, determined 
by Object.compareTo(..). Note: This config is deprecated, use 
`hoodie.table.ordering.fields` instead.<br /><br />`Config Param: 
PRECOMBINE_FIELD`                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       |

Review Comment:
   this diff is just formatting, and this line removing `ts` default value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: replace the notion of precombine field with ordering field [hudi]

Reply via email to