[
https://issues.apache.org/jira/browse/HUDI-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-7678:
---------------------------------
Description:
With the move towards making partial updates a first class citizen, that does
not need any special payloads/merges, we need to move the CDC payloads to all
be transformers in Hudi Streamer and SQL write path. Along with migration
instructions to users.
# partial update has been implemented for Spark SQL source as follows:
## Configuration {{ hoodie.write.partial.update.schema }} is used for partial
update.
## {{ExpressionPayload}} creates the writer schema based on the configuration.
## {{HoodieAppendHandle}} creates the log file based on the confgiuration and
the corresponding partial schema.
## Currently this handle assumes these records are all update records.
## We need to understand if ExpressionPayload/SQL Merger is needed to going
forward.
# For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g.,
Debezium or AWSDMS, and to provide CDC data as {{InternalRow}} type. Therefore,
## The {{transformer}} in DeltaStreamer prepares the data according to the
types of the sources.
## Initially, its okay to just support full row updates/deletes/...
> Finalize the Merger APIs and make a plan for moving over all existing
> built-in, custom payloads.
> ------------------------------------------------------------------------------------------------
>
> Key: HUDI-7678
> URL: https://issues.apache.org/jira/browse/HUDI-7678
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Vinoth Chandar
> Assignee: Vinoth Chandar
> Priority: Major
> Fix For: 1.0.0
>
>
> With the move towards making partial updates a first class citizen, that does
> not need any special payloads/merges, we need to move the CDC payloads to all
> be transformers in Hudi Streamer and SQL write path. Along with migration
> instructions to users.
> # partial update has been implemented for Spark SQL source as follows:
> ## Configuration {{ hoodie.write.partial.update.schema }} is used for
> partial update.
> ## {{ExpressionPayload}} creates the writer schema based on the
> configuration.
> ## {{HoodieAppendHandle}} creates the log file based on the confgiuration
> and the corresponding partial schema.
> ## Currently this handle assumes these records are all update records.
> ## We need to understand if ExpressionPayload/SQL Merger is needed to going
> forward.
> # For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g.,
> Debezium or AWSDMS, and to provide CDC data as {{InternalRow}} type.
> Therefore,
> ## The {{transformer}} in DeltaStreamer prepares the data according to the
> types of the sources.
> ## Initially, its okay to just support full row updates/deletes/...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)