[
https://issues.apache.org/jira/browse/HUDI-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843110#comment-17843110
]
Vinoth Chandar commented on HUDI-7229:
--------------------------------------
Punting this to 1.1
# [1.1] Implement support on top of data blocks.
## we need to pass change columns information and operation all the way to
write handles, using a field in HoodieRecord
## ...
# [1.1] Implement support on top of cdc data blocks.
## we can track similar bitmaps for cdc data blocks as well
## we need to extend the new file group reader to also merge base and cdc
blocks. (not just base and data blocks).
> Enable partial updates for CDC work payload
> -------------------------------------------
>
> Key: HUDI-7229
> URL: https://issues.apache.org/jira/browse/HUDI-7229
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Lin Liu
> Assignee: Vinoth Chandar
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> OLTP workloads on upstream databases, often update/delete/insert different
> columns in the table on each operation. Currently, Hudi can only supporting
> partial updates in cases where the same columns are being mutated in a given
> write to Hudi (e.g Spark SQL ETLs with MIT or Update statements). Here, we
> explore what it takes to support a smarter storage format, that can only
> encode the changed columns into log along with the different implementations.
> h2. Goals
> # Enable partial update functionality for all existing and potential future
> CDC workloads without huge modification or duplication.
> # Performance parity with current full-record updates or partial updates
> across the same set of columns
> # Exhibit reduction in storage costs, by only storing the changed columns.
> # Should also result in computation cost reductions by scanning/processing
> less data
> # Should not affect the scalability of the existing system ingestion system.
> The number of files generated for partial update should not increase
> dramatically.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)