yuxiqian opened a new pull request, #3831:
URL: https://github.com/apache/flink-cdc/pull/3831

   > Originally reported by @Jzjsnow in 
https://github.com/apache/flink-cdc/pull/3802#issuecomment-2567352155.
   
   Currently, OceanBase CDC connector determines the specific offset that marks 
the transition from "snapshot" to "streaming" stage by a timestamp 
(second-precision only).
   
   However, when Binlog records are actively inserted near the transition 
point, duplicate insert events might be emitted:
   
   ```
         Snapshot starts here
                  |
                  v
   ------------> +I[1] -----> +I[2] -----> -D[1] -----> ...
                  ^
                  |
     Transition Point marked here
   ```
   
   Since we don't have much records in database in test cases, snapshot stage 
completes almost immediately. Thus, record `+I[1]` might be emitted twice (as a 
snapshot record first, and a streaming record later).
   
   #3211 should fix this by enforcing the "exactly-once" semantic to Oceanbase 
connector, but it is not available yet. For now we will just sleep for a while 
after initializing the database, ensuring snapshot records will never be 
confused with streaming ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to