[
https://issues.apache.org/jira/browse/FLINK-36750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruan Hang resolved FLINK-36750.
-------------------------------
Resolution: Fixed
> Paimon connector would reuse sequence number when schema evolution happened
> ---------------------------------------------------------------------------
>
> Key: FLINK-36750
> URL: https://issues.apache.org/jira/browse/FLINK-36750
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.2.0
> Reporter: Yanquan Lv
> Assignee: Yanquan Lv
> Priority: Major
> Labels: pull-request-available
> Fix For: cdc-3.3.0, cdc-3.2.1
>
> Attachments: image-2024-11-20-13-00-58-282.png,
> image-2024-11-20-13-02-47-612.png, image-2024-11-20-13-04-53-635.png
>
>
> When schema evolution happened, we will prepare commit and recreate a new
> FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain
> some information like sequence number in memory, we can't directly remove and
> recreate one FileStoreWrite, instead, we should extract the information of
> Write and rebuild with this information.
> The sequence number is used to determine the order of data with two
> identical primary keys, If we don't strictly maintain this order, it may lead
> to unexpected situations.
> The following picture show The problem we are currently facing:
> 1) Schema evolution happened between the second and third
> files(`{*}schema_id{*}` changed)
> !image-2024-11-20-13-04-53-635.png!
> 2)The expected sequence number here should be increasing, however, there is
> an overlap of `{*}min_sequence_number{*}` between the third file and the
> second file.
> !image-2024-11-20-13-02-47-612.png!
> Due to the confusion of sequence numbers, we may read the data of
> update-before.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)