[hudi] branch asf-site updated: [DOCS] Update Record payload page (#9562)

bhavanisudha Tue, 29 Aug 2023 05:44:51 -0700

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 586b4638e34 [DOCS] Update Record payload page (#9562)
586b4638e34 is described below

commit 586b4638e3423b26f70156d27e5932bd8513d15f
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Tue Aug 29 05:44:39 2023 -0700

    [DOCS] Update Record payload page (#9562)
---
 website/docs/record_payload.md     | 57 ++++++++++++++++++++++++++++++--------
 website/src/theme/DocPage/index.js |  2 +-
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/website/docs/record_payload.md b/website/docs/record_payload.md
index 3fcc05d8b6d..1ed47b2ca96 100644
--- a/website/docs/record_payload.md
+++ b/website/docs/record_payload.md
@@ -1,9 +1,12 @@
 ---
 title: Record Payload 
 keywords: [hudi, merge, upsert, precombine]
+toc: true
+toc_min_heading_level: 2
+toc_max_heading_level: 4
 ---
 
-## Record Payload
+### Background
 
 One of the core features of Hudi is the ability to incrementally upsert data, 
deduplicate and merge records on the fly.
 Additionally, users can implement their custom logic to merge the input 
records with the record on storage. Record
@@ -20,15 +23,38 @@ stage, Hudi performs any deduplication based on the payload 
implementation and p
 Further, on index lookup, Hudi identifies which records are being updated and 
the record payload implementation tells
 Hudi how to merge the incoming record with the existing record on storage.
 
+
+### Configs
+
+Payload class can be specified using the below configs. For more advanced 
configs refer 
[here](https://hudi.apache.org/docs/configurations#RECORD_PAYLOAD) 
+
+**Spark based configs;**
+
+| Config Name                            | Default                             
                                   | Description                                
                                                                                
                                                                                
                         |
+| ---------------------------------------| 
---------------------------------------------------------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.datasource.write.payload.class  | 
org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional) | 
Payload class used. Override this, if you like to roll your own merge logic, 
when upserting/inserting. This will render any value set for 
PRECOMBINE_FIELD_OPT_VAL in-effective<br /><br />`Config Param: 
WRITE_PAYLOAD_CLASS_NAME` |
+
+**Flink based configs:**
+
+| Config Name                            | Default                             
                                   | Description                                
                                                                                
                                                                                
                         |
+| ---------------------------------------| 
---------------------------------------------------------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| payload.class                          | 
org.apache.hudi.common.model.EventTimeAvroPayload (Optional)           | 
Payload class used. Override this, if you like to roll your own merge logic, 
when upserting/inserting. This will render any value set for the option 
in-effective<br /><br /> `Config Param: PAYLOAD_CLASS_NAME`                    |
+
 ### Existing Payloads
 
 #### OverwriteWithLatestAvroPayload
+```scala
+hoodie.datasource.write.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+```
 
 This is the default record payload implementation. It picks the record with 
the greatest value (determined by calling
 `.compareTo()` on the value of precombine key) to break ties and simply picks 
the latest record while merging. This gives
 latest-write-wins style semantics.
 
 #### DefaultHoodieRecordPayload
+```scala
+hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload
+```
 While `OverwriteWithLatestAvroPayload` precombines based on an ordering field 
and picks the latest record while merging,
 `DefaultHoodieRecordPayload` honors the ordering field for both precombinig 
and merging. Let's understand the difference with an example:
 
@@ -74,20 +100,26 @@ Result data after merging using 
`DefaultHoodieRecordPayload` (always honors orde
 ```
 
 #### EventTimeAvroPayload
-
-Some use cases require merging records by event time and thus event time plays 
the role of an ordering field. This
-payload is particularly useful in the case of late-arriving data. For such use 
cases, users need to set
-the [payload event time field](/docs/configurations#RECORD_PAYLOAD) 
configuration.
+```scala
+hoodie.datasource.write.payload.class=org.apache.hudi.common.model.EventTimeAvroPayload
+```
+This is the default record payload for Flink based writing. Some use cases 
require merging records by event time and 
+thus event time plays the role of an ordering field. This payload is 
particularly useful in the case of late-arriving data. 
+For such use cases, users need to set the [payload event time 
field](/docs/configurations#RECORD_PAYLOAD) configuration.
 
 #### OverwriteNonDefaultsWithLatestAvroPayload
-
+```scala
+hoodie.datasource.write.payload.class=org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload
+```
 This payload is quite similar to `OverwriteWithLatestAvroPayload` with slight 
difference while merging records. For
 precombining, just like `OverwriteWithLatestAvroPayload`, it picks the latest 
record for a key, based on an ordering
 field. While merging, it overwrites the existing record on storage only for 
the specified **fields that don't equal
 default value** for that field.
 
 #### PartialUpdateAvroPayload
-
+```scala
+hoodie.datasource.write.payload.class=org.apache.hudi.common.model.PartialUpdateAvroPayload
+```
 This payload supports partial update. Typically, once the merge step resolves 
which record to pick, then the record on
 storage is fully replaced by the resolved record. But, in some cases, the 
requirement is to update only certain fields
 and not replace the whole record. This is called partial update. 
`PartialUpdateAvroPayload` provides out-of-box support 
@@ -132,12 +164,13 @@ Result data after merging using 
`PartialUpdateAvroPayload`:
 In this document, we highlighted the role of record payload to support fast 
incremental ETL with updates and deletes. We
 also talked about some payload implementations readily provided by Hudi. There 
are quite a few other implementations
 and developers would be interested in looking at the hierarchy of 
`HoodieRecordPayload` interface. For
-example, `MySqlDebeziumAvroPayload` and `PostgresDebeziumAvroPayload` provides 
support for seamlessly applying changes
-captured via Debezium for MySQL and PostgresDB. `AWSDmsAvroPayload` provides 
support for applying changes captured via
-Amazon Database Migration Service onto S3.
+example, 
[`MySqlDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/debezium/MySqlDebeziumAvroPayload.java)
+and 
[`PostgresDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/debezium/PostgresDebeziumAvroPayload.java)
 
+provides support for seamlessly applying changes captured via Debezium for 
MySQL and PostgresDB. 
+[`AWSDmsAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/AWSDmsAvroPayload.java)
+provides support for applying changes captured via Amazon Database Migration 
Service onto S3.
 
 Record payloads are tunable to suit many use cases. Please check out the 
configurations
 listed [here](/docs/configurations#RECORD_PAYLOAD). Moreover, if users want to 
implement their own custom merge logic,
-please check
-out [this 
FAQ](/docs/faq/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage).
 In a
+please check out [this 
FAQ](/docs/faq/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage).
 In a
 separate document, we will talk about a new record merger API for optimized 
payload handling.
diff --git a/website/src/theme/DocPage/index.js 
b/website/src/theme/DocPage/index.js
index 640f37b3915..978dfb6318f 100644
--- a/website/src/theme/DocPage/index.js
+++ b/website/src/theme/DocPage/index.js
@@ -128,7 +128,7 @@ function DocPageContent({
   );
 }
 
-const arrayOfPages = (matchPath) => [`${matchPath}/configurations`, 
`${matchPath}/basic_configurations`, `${matchPath}/timeline`, 
`${matchPath}/table_types`, `${matchPath}/migration_guide`, 
`${matchPath}/compaction`, `${matchPath}/clustering`, `${matchPath}/indexing`, 
`${matchPath}/metadata`, `${matchPath}/metadata_indexing`];
+const arrayOfPages = (matchPath) => [`${matchPath}/configurations`, 
`${matchPath}/basic_configurations`, `${matchPath}/timeline`, 
`${matchPath}/table_types`, `${matchPath}/migration_guide`, 
`${matchPath}/compaction`, `${matchPath}/clustering`, `${matchPath}/indexing`, 
`${matchPath}/metadata`, `${matchPath}/metadata_indexing`, 
`${matchPath}/record_payload`];
 const showCustomStylesForDocs = (matchPath, pathname) => 
arrayOfPages(matchPath).includes(pathname);
 function DocPage(props) {
   const {

[hudi] branch asf-site updated: [DOCS] Update Record payload page (#9562)

Reply via email to