[
https://issues.apache.org/jira/browse/HUDI-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-7678:
------------------------------
Description:
This ticket is to make sure that in Hudi 1.0 all existing built-in and custom
payload classes should still work, providing the same functionality as 0.x
releases. They can be realized through either the new record merge API
implementation, or still the existing payload class implementation through
retrofit.
We'll fully migrate all exsiting payload logic to the new record merge
implementation in HUDI-8401, i.e., with the move towards making partial updates
a first class citizen, that does not need any special payloads/merges, we need
to move the CDC payloads to all be transformers in Hudi Streamer and SQL write
path. Along with migration instructions to users.
was:
With the move towards making partial updates a first class citizen, that does
not need any special payloads/merges, we need to move the CDC payloads to all
be transformers in Hudi Streamer and SQL write path. Along with migration
instructions to users.
# partial update has been implemented for Spark SQL source as follows:
## Configuration \{{ hoodie.write.partial.update.schema }} is used for partial
update.
## {{ExpressionPayload}} creates the writer schema based on the configuration.
## {{HoodieAppendHandle}} creates the log file based on the confgiuration and
the corresponding partial schema.
## Currently this handle assumes these records are all update records.
## We need to understand if ExpressionPayload/SQL Merger is needed to going
forward.
# For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g.,
Debezium or AWSDMS, and to provide CDC data as {{InternalRow}} type. Therefore,
## The {{transformer}} in DeltaStreamer prepares the data according to the
types of the sources.
## Initially, its okay to just support full row updates/deletes/...
# Audit all of them should properly combine I/U/D into data and delete blocks,
such that U after D, D after U scenarios are handled as expected.
> Finalize the Merger APIs and make a plan for moving over all existing
> built-in, custom payloads.
> ------------------------------------------------------------------------------------------------
>
> Key: HUDI-7678
> URL: https://issues.apache.org/jira/browse/HUDI-7678
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Vinoth Chandar
> Assignee: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.0
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> This ticket is to make sure that in Hudi 1.0 all existing built-in and custom
> payload classes should still work, providing the same functionality as 0.x
> releases. They can be realized through either the new record merge API
> implementation, or still the existing payload class implementation through
> retrofit.
> We'll fully migrate all exsiting payload logic to the new record merge
> implementation in HUDI-8401, i.e., with the move towards making partial
> updates a first class citizen, that does not need any special
> payloads/merges, we need to move the CDC payloads to all be transformers in
> Hudi Streamer and SQL write path. Along with migration instructions to users.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)