[
https://issues.apache.org/jira/browse/HUDI-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8401:
------------------------------
Fix Version/s: (was: 1.0.2)
> Support CDC payload, partial updates, and custom logic through native record
> merge API implementation
> -----------------------------------------------------------------------------------------------------
>
> Key: HUDI-8401
> URL: https://issues.apache.org/jira/browse/HUDI-8401
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Y Ethan Guo
> Assignee: Y Ethan Guo
> Priority: Blocker
> Labels: story
> Fix For: 1.1.0
>
>
> With the move towards making partial updates a first class citizen, that does
> not need any special payloads/merges, we need to move the CDC payloads to all
> be transformers in Hudi Streamer and SQL write path. Along with migration
> instructions to users.
> # partial update has been implemented for Spark SQL source as follows:
> ## Configuration \{{ hoodie.write.partial.update.schema }} is used for
> partial update.
> ## {{ExpressionPayload}} creates the writer schema based on the
> configuration.
> ## {{HoodieAppendHandle}} creates the log file based on the confgiuration
> and the corresponding partial schema.
> ## Currently this handle assumes these records are all update records.
> ## We need to understand if ExpressionPayload/SQL Merger is needed to going
> forward.
> # For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g.,
> Debezium or AWS DMS, and to provide CDC data as {{InternalRow}} type.
> Therefore,
> ## The {{transformer}} in DeltaStreamer prepares the data according to the
> types of the sources.
> ## Initially, its okay to just support full row updates/deletes/...
> # Audit all of them should properly combine I/U/D into data and delete
> blocks, such that U after D, D after U scenarios are handled as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)