lvyanquan commented on PR #3902:
URL: https://github.com/apache/flink-cdc/pull/3902#issuecomment-2639121416

   Hi, @JNSimba.
   
   I found that the change of https://github.com/apache/flink-cdc/pull/3415 
will cause the following problem:
   1. Even though there is not record between lowWatermark and highWatermark, 
we still need a backfill procedure, and the timestamp of highWatermark is 
greater than lowWatermark.
   2. If there is no more record after highWatermark, the backfill procedure 
will never stop, as we will need to wait for a record that is at or after the 
highWatermark(as the following code displayed).
   
https://github.com/apache/flink-cdc/blob/7717779ebff3d5900c3adcd06bc860949d543f97/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/debezium/task/MySqlBinlogSplitReadTask.java#L102
   
   I thank that the change of https://github.com/apache/flink-cdc/pull/3415 
mainly to solve the problem of the comparson of highWatermark and the last 
record, but in reality, we don't need to read this record, so we can skip it 
directly.
   
https://github.com/apache/flink-cdc/blob/7717779ebff3d5900c3adcd06bc860949d543f97/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/debezium/task/MySqlBinlogSplitReadTask.java#L102
   
   Notice that this will not cause data loss, as this record will be read in 
backfill phase.
   
   What do you think?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to