lvyanquan commented on PR #3902: URL: https://github.com/apache/flink-cdc/pull/3902#issuecomment-2639121416
Hi, @JNSimba. I found that the change of https://github.com/apache/flink-cdc/pull/3415 will cause the following problem: 1. Even though there is not record between lowWatermark and highWatermark, we still need a backfill procedure, and the timestamp of highWatermark is greater than lowWatermark. 2. If there is no more record after highWatermark, the backfill procedure will never stop, as we will need to wait for a record that is at or after the highWatermark(as the following code displayed). https://github.com/apache/flink-cdc/blob/7717779ebff3d5900c3adcd06bc860949d543f97/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/debezium/task/MySqlBinlogSplitReadTask.java#L102 I thank that the change of https://github.com/apache/flink-cdc/pull/3415 mainly to solve the problem of the comparson of highWatermark and the last record, but in reality, we don't need to read this record, so we can skip it directly. https://github.com/apache/flink-cdc/blob/7717779ebff3d5900c3adcd06bc860949d543f97/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/debezium/task/MySqlBinlogSplitReadTask.java#L102 Notice that this will not cause data loss, as this record will be read in backfill phase. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
