Re: [PR] [FLINK-37332] Support any column as chunk key column (postgres, orcale, db2, sqlserver) #3922 [flink-cdc]

via GitHub Tue, 25 Mar 2025 21:06:35 -0700


ruanhang1993 commented on PR #3928:
URL: https://github.com/apache/flink-cdc/pull/3928#issuecomment-2753148934


   @SML0127 Thanks for your contribution.
   
   Using a column which is not a primary key column as the chunk key may cause 
data error. 
   Consider a table whose primary key is `id` and chunk key column is `pid`. 
And there are 2 snapshot splits.
   - Split0: 1< pid <= 3
   - Split1: 3< pid <= 5
   
   Split 0 and Split 1 are read by different subtasks. An update change happens 
when reading these snapshot splits and its binlog offset is between the low and 
high watermark of both splits.
   
![dad](https://github.com/user-attachments/assets/8c548f21-2781-4573-a684-fd8e2376603a)
   
   This update will be handled in the backfill task of both split reader and it 
will not be sent in the incremental phase.
   Split0 will have the record [id=0, pid=2]. Split1 will have the record 
[id=0, pid=4]. 
   We cannot control the order of these records with the same id. The final pid 
of id=0 will be 2 or 4, which depends on the order of records.
   
   Could we add some warnings in docs for this usage in mysql and other 
connectors ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-37332] Support any column as chunk key column (postgres, orcale, db2, sqlserver) #3922 [flink-cdc]

Reply via email to