[
https://issues.apache.org/jira/browse/HUDI-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853214#comment-17853214
]
Shiyan Xu commented on HUDI-4705:
---------------------------------
[~lizhiqiang] [[email protected]] to clarify, CDC for spark works on MOR,
just that the implementation is using write-on-indexing strategy (ref:
[https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md#persisting-cdc-in-mor-write-on-indexing-vs-write-on-compaction)]
We want to unify the implementation as write-on-compaction, which allows flink
writer to work too. (write-on-indexing strategy does not work for flink as
explained in the RFC)
> Support Write-on-compaction mode when query cdc on MOR tables
> -------------------------------------------------------------
>
> Key: HUDI-4705
> URL: https://issues.apache.org/jira/browse/HUDI-4705
> Project: Apache Hudi
> Issue Type: New Feature
> Components: compaction, spark, table-service
> Reporter: Yann Byron
> Priority: Major
>
> For the case that query cdc on MOR tables, the initial implementation use the
> `Write-on-indexing` way to extract the cdc data by merging the base file and
> log files in-flight.
> This ticket wants to support the `Write-on-compaction` way to get the cdc
> data just by reading the persisted cdc files which are written at the
> compaction operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)