ruanhang1993 commented on PR #3928: URL: https://github.com/apache/flink-cdc/pull/3928#issuecomment-2753148934
@SML0127 Thanks for your contribution. Using a column which is not a primary key column as the chunk key may cause data error. Consider a table whose primary key is `id` and chunk key column is `pid`. And there are 2 snapshot splits. - Split0: 1< pid <= 3 - Split1: 3< pid <= 5 Split 0 and Split 1 are read by different subtasks. An update change happens when reading these snapshot splits and its binlog offset is between the low and high watermark of both splits.  This update will be handled in the backfill task of both split reader and it will not be sent in the incremental phase. Split0 will have the record [id=0, pid=2]. Split1 will have the record [id=0, pid=4]. We cannot control the order of these records with the same id. The final pid of id=0 will be 2 or 4, which depends on the order of records. Could we add some warnings in docs for this usage in mysql and other connectors ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
