[jira] [Updated] (HUDI-7678) Finalize the Merger APIs and make a plan for moving over all existing built-in, custom payloads.

Y Ethan Guo (Jira) Mon, 23 Sep 2024 11:14:04 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Y Ethan Guo updated HUDI-7678:
------------------------------
    Sprint: Sprint 2024-04-26, 2024/06/17-30, 2024/06/03-16, Hudi 1.0 Sprint 
2024/09/09-15, Hudi 1.0 Sprint 2024/09/16-22, Hudi 1.0 Sprint 2024/09/16-23  
(was: Sprint 2024-04-26, 2024/06/17-30, 2024/06/03-16, Hudi 1.0 Sprint 
2024/09/09-15, Hudi 1.0 Sprint 2024/09/16-22)

> Finalize the Merger APIs and make a plan for moving over all existing 
> built-in, custom payloads.
> ------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-7678
>                 URL: https://issues.apache.org/jira/browse/HUDI-7678
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Vinoth Chandar
>            Assignee: Ethan Guo (this is the old account; please use "yihua")
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> With the move towards making partial updates a first class citizen, that does 
> not need any special payloads/merges, we need to move the CDC payloads to all 
> be transformers in Hudi Streamer and SQL write path. Along with migration 
> instructions to users. 
>  # partial update has been implemented for Spark SQL source as follows:
>  ## Configuration \{{ hoodie.write.partial.update.schema }} is used for 
> partial update.
>  ## {{ExpressionPayload}} creates the writer schema based on the 
> configuration.
>  ## {{HoodieAppendHandle}} creates the log file based on the confgiuration 
> and the corresponding partial schema.
>  ## Currently this handle assumes these records are all update records.
>  ## We need to understand if ExpressionPayload/SQL Merger is needed to going 
> forward. 
>  # For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g., 
> Debezium or AWSDMS, and to provide CDC data as {{InternalRow}} type. 
> Therefore,
>  ## The {{transformer}} in DeltaStreamer prepares the data according to the 
> types of the sources.
>  ## Initially, its okay to just support full row updates/deletes/... 
>  # Audit all of them should properly combine I/U/D into data and delete 
> blocks, such that U after D, D after U scenarios are handled as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7678) Finalize the Merger APIs and make a plan for moving over all existing built-in, custom payloads.

Reply via email to