[ 
https://issues.apache.org/jira/browse/HUDI-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-7678:
------------------------------
    Description: 
This ticket is to make sure that in Hudi 1.0 all existing built-in and custom 
payload classes should still work, providing the same functionality as 0.x 
releases.  They can be realized through either the new record merge API 
implementation, or still the existing payload class implementation through 
retrofit.

We'll fully migrate all exsiting payload logic to the new record merge 
implementation in HUDI-8401, i.e., with the move towards making partial updates 
a first class citizen, that does not need any special payloads/merges, we need 
to move the CDC payloads to all be transformers in Hudi Streamer and SQL write 
path. Along with migration instructions to users. 

  was:
With the move towards making partial updates a first class citizen, that does 
not need any special payloads/merges, we need to move the CDC payloads to all 
be transformers in Hudi Streamer and SQL write path. Along with migration 
instructions to users. 
 # partial update has been implemented for Spark SQL source as follows:
 ## Configuration \{{ hoodie.write.partial.update.schema }} is used for partial 
update.
 ## {{ExpressionPayload}} creates the writer schema based on the configuration.
 ## {{HoodieAppendHandle}} creates the log file based on the confgiuration and 
the corresponding partial schema.
 ## Currently this handle assumes these records are all update records.
 ## We need to understand if ExpressionPayload/SQL Merger is needed to going 
forward. 
 # For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g., 
Debezium or AWSDMS, and to provide CDC data as {{InternalRow}} type. Therefore,
 ## The {{transformer}} in DeltaStreamer prepares the data according to the 
types of the sources.
 ## Initially, its okay to just support full row updates/deletes/... 
 # Audit all of them should properly combine I/U/D into data and delete blocks, 
such that U after D, D after U scenarios are handled as expected.


> Finalize the Merger APIs and make a plan for moving over all existing 
> built-in, custom payloads.
> ------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-7678
>                 URL: https://issues.apache.org/jira/browse/HUDI-7678
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Vinoth Chandar
>            Assignee: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.0
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> This ticket is to make sure that in Hudi 1.0 all existing built-in and custom 
> payload classes should still work, providing the same functionality as 0.x 
> releases.  They can be realized through either the new record merge API 
> implementation, or still the existing payload class implementation through 
> retrofit.
> We'll fully migrate all exsiting payload logic to the new record merge 
> implementation in HUDI-8401, i.e., with the move towards making partial 
> updates a first class citizen, that does not need any special 
> payloads/merges, we need to move the CDC payloads to all be transformers in 
> Hudi Streamer and SQL write path. Along with migration instructions to users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to