[ 
https://issues.apache.org/jira/browse/HUDI-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853214#comment-17853214
 ] 

Shiyan Xu commented on HUDI-4705:
---------------------------------

[~lizhiqiang] [[email protected]] to clarify, CDC for spark works on MOR, 
just that the implementation is using write-on-indexing strategy (ref: 
[https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md#persisting-cdc-in-mor-write-on-indexing-vs-write-on-compaction)]

 

We want to unify the implementation as write-on-compaction, which allows flink 
writer to work too. (write-on-indexing strategy does not work for flink as 
explained in the RFC)

> Support Write-on-compaction mode when query cdc on MOR tables
> -------------------------------------------------------------
>
>                 Key: HUDI-4705
>                 URL: https://issues.apache.org/jira/browse/HUDI-4705
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: compaction, spark, table-service
>            Reporter: Yann Byron
>            Priority: Major
>
> For the case that query cdc on MOR tables, the initial implementation use the 
> `Write-on-indexing`  way to extract the cdc data by merging the base file and 
> log files in-flight.
> This ticket wants to support the `Write-on-compaction` way to get the cdc 
> data just by reading the persisted cdc files which are written at the 
> compaction operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to