[ 
https://issues.apache.org/jira/browse/FLINK-36778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shaohui hong updated FLINK-36778:
---------------------------------
          Component/s: Flink CDC
    Affects Version/s: 3.0.0
          Description: 
If a user does not specify chunk key column in OracleIncrementalSource, the 
finkc-oracle-connector-cdc will choose rowid as chunk key column. Everything is 
correct during snapshotting data, but thing comes wrong when it changes to the 
phase of stream backfill.

The data of a captured table is spllited to chunks using chunk key column. 
There are four steps needed to process each snapshot chunk. The first is 
determing low watermark,  the second is snapshotting data, the third is 
determing high water mark, and the last is stream backfill. All the output 
elements are put into a queue, and processed by the function named 
pollSplitRecords defined in IncrementalSourceScanFetcher.java. The format of 
the queue is as following:

[low watermark event][snapshot events][high watermark event][change events][end 
watermark event]

The snapshot data will put into a map named outputBuffer, the key of which is 
chunk key column name, and the value of which is a record in the captured 
table. If rowid is used as chunk key column, the key of outputBuffer will be 
rowid. At this situation, when stream backfill data is used to rewrite 
outputBuffer, its key is null when the captured table does not define primary 
key, or is formatted by primary keys defined in the captured table, which leads 
to the failure of finding the key in outputBuffer to rewrite value.
              Summary: losing data when using rowid as the chunk key column in 
OracleIncrementalSource.java  (was: The default implementation )

> losing data when using rowid as the chunk key column in 
> OracleIncrementalSource.java
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-36778
>                 URL: https://issues.apache.org/jira/browse/FLINK-36778
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: 3.0.0
>            Reporter: shaohui hong
>            Priority: Major
>
> If a user does not specify chunk key column in OracleIncrementalSource, the 
> finkc-oracle-connector-cdc will choose rowid as chunk key column. Everything 
> is correct during snapshotting data, but thing comes wrong when it changes to 
> the phase of stream backfill.
> The data of a captured table is spllited to chunks using chunk key column. 
> There are four steps needed to process each snapshot chunk. The first is 
> determing low watermark,  the second is snapshotting data, the third is 
> determing high water mark, and the last is stream backfill. All the output 
> elements are put into a queue, and processed by the function named 
> pollSplitRecords defined in IncrementalSourceScanFetcher.java. The format of 
> the queue is as following:
> [low watermark event][snapshot events][high watermark event][change 
> events][end watermark event]
> The snapshot data will put into a map named outputBuffer, the key of which is 
> chunk key column name, and the value of which is a record in the captured 
> table. If rowid is used as chunk key column, the key of outputBuffer will be 
> rowid. At this situation, when stream backfill data is used to rewrite 
> outputBuffer, its key is null when the captured table does not define primary 
> key, or is formatted by primary keys defined in the captured table, which 
> leads to the failure of finding the key in outputBuffer to rewrite value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to