This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new c32f1a754be [HUDI-8671][DOCS] Improve record merger docs (#12440)
c32f1a754be is described below

commit c32f1a754be60677b5d8e638b52eaae80215801a
Author: Y Ethan Guo <[email protected]>
AuthorDate: Fri Dec 6 05:40:18 2024 -0800

    [HUDI-8671][DOCS] Improve record merger docs (#12440)
---
 website/docs/record_merger.md | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/website/docs/record_merger.md b/website/docs/record_merger.md
index 983b0f63da9..d98a5fc462a 100644
--- a/website/docs/record_merger.md
+++ b/website/docs/record_merger.md
@@ -8,7 +8,7 @@ toc_max_heading_level: 4
 
 Hudi handles mutations to records and streaming data, as we briefly touched 
upon in [timeline ordering](/docs/next/timeline#ordering-of-actions) section. 
 To provide users full-fledged support for stream processing, Hudi goes all the 
way making the storage engine and the underlying storage format 
-understand how to merge change changes to the same record key, that may arrive 
even in different order at different times. With the rise of mobile applications
+understand how to merge changes to the same record key, that may arrive even 
in different order at different times. With the rise of mobile applications
  and IoT, these scenarios have become the normal than an exception. For e.g. a 
social networking application uploading user events several hours after they 
happened,
 when the user connects to WiFi networks.
 
@@ -30,7 +30,7 @@ different merge results when switching between the modes.
 
 ### COMMIT_TIME_ORDERING
 
-This is the default merge mode. Here, we expect the input records to arrive in 
strict order such that arrival order is same as their
+Here, we expect the input records to arrive in strict order such that arrival 
order is same as their
 delta commit order on the table. Merging simply picks the record belonging to 
the latest write as the merged result. In relational data mode speak, 
 this provides overwrite semantics aligned with serializable writes on the 
timeline. 
 
@@ -43,7 +43,7 @@ that denotes the ordering of the writes in the upstream 
database.
 
 ### EVENT_TIME_ORDERING
 
-While commit time ordering provides a well-understood standard behavior, it's 
hardly sufficient. The commit time is unrelated to the actual 
+This is the default merge mode. While commit time ordering provides a 
well-understood standard behavior, it's hardly sufficient. The commit time is 
unrelated to the actual 
 ordering of data that a user may care about and strict ordering of input in 
complex distributed systems is difficult to achieve. 
 With event time ordering, the merging picks the record with the highest value 
on a user specified _**ordering or precombine field**_ as the merged result. 
 
@@ -52,7 +52,7 @@ With event time ordering, the merging picks the record with 
the highest value on
 </figure>
 
 In the example above, two microservices product change records about orders at 
different times, that can arrive out-of-order. As color coded, 
-this can lead to application level inconsistent states in the table if simply 
merged in commit time order like a cancelled order being re-created or 
+this can lead to application-level inconsistent states in the table if simply 
merged in commit time order like a cancelled order being re-created or 
 a paid order moved back to just created state expecting payment again. Event 
time ordering helps by ignoring older state changes that arrive late and
 avoiding order status from "jumping back" in time. Combined with [non-blocking 
concurrency 
control](/docs/next/concurrency_control#non-blocking-concurrency-control-mode), 
 this provides a very powerful way for processing such data streams efficiently 
and correctly.
@@ -67,7 +67,7 @@ across order_info and payment_info without costly self-join 
on each access.
 
 Hudi allows authoring of cross-language custom record mergers on top of a 
standard record merger API, that supports full and partial merges. The java 
APIs 
 are sketched below at a high-level. It simply takes older/newer records in 
engine native formats and produces a merged record or returns empty to skip 
them entirely (e.g. soft deletes).
-Record merger is configured using a `hoodie.record.merge.strategy.id` table 
config whose value is an uuid, that is expected to be returned by 
`getMergingStrategy()`
+Record merger is configured using a `hoodie.write.record.merge.strategy.id` 
write config whose value is an uuid, that is taken by the writer to persist in 
the table config, and is expected to be returned by `getMergingStrategy()`
 method below. Using this mechanism, Hudi can automatically deduce the record 
merger to use for the table across different language/engine runtimes.
 
 ```Java
@@ -91,6 +91,15 @@ interface HoodieRecordMerger {
 }
 ```
 
+### Record Merge Configs
+
+The record merge mode and optional record merge strategy ID and custom merge 
implementation classes can be specified using the below configs. 
+
+| Config Name                            | Default                             
                                   | Description                                
                                                                                
                                                                                
                         |
+| ---------------------------------------| 
---------------------------------------------------------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.write.record.merge.mode  | EVENT_TIME_ORDERING | Determines the logic 
of merging different records with the same record key. Valid values: (1) 
`COMMIT_TIME_ORDERING`: use commit time to merge records, i.e., the record from 
later commit overwrites the earlier record with the same key. (2) 
`EVENT_TIME_ORDERING` (default): use event time as the ordering to merge 
records, i.e., the record with the larger event time overwrites the record with 
the smaller event time on the same key, r [...]
+| hoodie.write.record.merge.strategy.id  | N/A (Optional) | ID of record merge 
strategy. When you specify this config, you also need to specify 
`hoodie.write.record.merge.custom.implementation.classes`. Hudi picks the 
`HoodieRecordMerger` implementation class from the list of classes in 
`hoodie.write.record.merge.custom.implementation.classes` that has the 
specified merge strategy ID.<br />`Config Param: RECORD_MERGE_STRATEGY_ID`<br 
/>`Since Version: 0.13.0` |
+| hoodie.write.record.merge.custom.implementation.classes  | N/A (Optional) | 
List of `HoodieRecordMerger` implementations constituting Hudi's merging 
strategy based on the engine used. Hudi picks the `HoodieRecordMerger` 
implementation class from this list based on the specified 
`hoodie.write.record.merge.strategy.id`.<br />`Config Param: 
RECORD_MERGE_IMPL_CLASSES`<br />`Since Version: 0.13.0` |
 
 
 ### Record Payloads
@@ -224,7 +233,7 @@ Result data after merging using `PartialUpdateAvroPayload`:
 
 Payload class can be specified using the below configs. For more advanced 
configs refer [here](https://hudi.apache.org/docs/configurations#RECORD_PAYLOAD)
 
-**Spark based configs;**
+**Spark based configs:**
 
 | Config Name                            | Default                             
                                   | Description                                
                                                                                
                                                                                
                         |
 | ---------------------------------------| 
---------------------------------------------------------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

Reply via email to