Yanquan Lv created FLINK-36750:
----------------------------------
Summary: Paimon connector would reuse sequence number when schema
evolution happened
Key: FLINK-36750
URL: https://issues.apache.org/jira/browse/FLINK-36750
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: Yanquan Lv
Fix For: cdc-3.2.1
Attachments: image-2024-11-20-13-00-58-282.png,
image-2024-11-20-13-02-47-612.png, image-2024-11-20-13-04-53-635.png
When schema evolution happened, we will prepare commit and recreate a new
FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain
some information like sequence number in memory, we can't directly remove and
recreate one FileStoreWrite, instead, we should extract the information of
Write and rebuild with this information.
The sequence number is used to determine the order of data with two identical
primary keys, If we don't strictly maintain this order, it may lead to
unexpected situations.
The following picture show The problem we are currently facing:
1) Schema evolution happened between the second and third files
!image-2024-11-20-13-04-53-635.png!
2)The expected sequence number here should be increasing, however, there is an
overlap between the third file and the second file.
!image-2024-11-20-13-02-47-612.png!
Due to the confusion of sequence numbers, we may read the data of update-before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)