[
https://issues.apache.org/jira/browse/FLINK-36778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shaohui hong updated FLINK-36778:
---------------------------------
Component/s: Flink CDC
Affects Version/s: 3.0.0
Description:
If a user does not specify chunk key column in OracleIncrementalSource, the
finkc-oracle-connector-cdc will choose rowid as chunk key column. Everything is
correct during snapshotting data, but thing comes wrong when it changes to the
phase of stream backfill.
The data of a captured table is spllited to chunks using chunk key column.
There are four steps needed to process each snapshot chunk. The first is
determing low watermark, the second is snapshotting data, the third is
determing high water mark, and the last is stream backfill. All the output
elements are put into a queue, and processed by the function named
pollSplitRecords defined in IncrementalSourceScanFetcher.java. The format of
the queue is as following:
[low watermark event][snapshot events][high watermark event][change events][end
watermark event]
The snapshot data will put into a map named outputBuffer, the key of which is
chunk key column name, and the value of which is a record in the captured
table. If rowid is used as chunk key column, the key of outputBuffer will be
rowid. At this situation, when stream backfill data is used to rewrite
outputBuffer, its key is null when the captured table does not define primary
key, or is formatted by primary keys defined in the captured table, which leads
to the failure of finding the key in outputBuffer to rewrite value.
Summary: losing data when using rowid as the chunk key column in
OracleIncrementalSource.java (was: The default implementation )
> losing data when using rowid as the chunk key column in
> OracleIncrementalSource.java
> ------------------------------------------------------------------------------------
>
> Key: FLINK-36778
> URL: https://issues.apache.org/jira/browse/FLINK-36778
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: 3.0.0
> Reporter: shaohui hong
> Priority: Major
>
> If a user does not specify chunk key column in OracleIncrementalSource, the
> finkc-oracle-connector-cdc will choose rowid as chunk key column. Everything
> is correct during snapshotting data, but thing comes wrong when it changes to
> the phase of stream backfill.
> The data of a captured table is spllited to chunks using chunk key column.
> There are four steps needed to process each snapshot chunk. The first is
> determing low watermark, the second is snapshotting data, the third is
> determing high water mark, and the last is stream backfill. All the output
> elements are put into a queue, and processed by the function named
> pollSplitRecords defined in IncrementalSourceScanFetcher.java. The format of
> the queue is as following:
> [low watermark event][snapshot events][high watermark event][change
> events][end watermark event]
> The snapshot data will put into a map named outputBuffer, the key of which is
> chunk key column name, and the value of which is a record in the captured
> table. If rowid is used as chunk key column, the key of outputBuffer will be
> rowid. At this situation, when stream backfill data is used to rewrite
> outputBuffer, its key is null when the captured table does not define primary
> key, or is formatted by primary keys defined in the captured table, which
> leads to the failure of finding the key in outputBuffer to rewrite value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)