yuxiqian opened a new pull request, #3831: URL: https://github.com/apache/flink-cdc/pull/3831
> Originally reported by @Jzjsnow in https://github.com/apache/flink-cdc/pull/3802#issuecomment-2567352155. Currently, OceanBase CDC connector determines the specific offset that marks the transition from "snapshot" to "streaming" stage by a timestamp (second-precision only). However, when Binlog records are actively inserted near the transition point, duplicate insert events might be emitted: ``` Snapshot starts here | v ------------> +I[1] -----> +I[2] -----> -D[1] -----> ... ^ | Transition Point marked here ``` Since we don't have much records in database in test cases, snapshot stage completes almost immediately. Thus, record `+I[1]` might be emitted twice (as a snapshot record first, and a streaming record later). #3211 should fix this by enforcing the "exactly-once" semantic to Oceanbase connector, but it is not available yet. For now we will just sleep for a while after initializing the database, ensuring snapshot records will never be confused with streaming ones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
