[
https://issues.apache.org/jira/browse/FLINK-36397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905945#comment-17905945
]
Zhongmin Qiao commented on FLINK-36397:
---------------------------------------
There will be two situations:
# If {{isBackfillSkipped}} is true, then the data will replay from {{{}lw{}}},
which will cause the data to replay twice.
# If {{isBackfillSkipped}} is false, then the data will replay from
{{{}hw{}}}, which will cause data inserted between {{lw}} and {{hw}} to be lost.
The final issue is that during the snapshot process, the {{SELECT}} and {{SHOW
MASTER STATUS}} commands are not executed within the same transaction. This
discrepancy can result in data being replayed twice or lost.
CC [~diwu] [Ruan
Hang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ruanhang1993]
> Using the offset obtained after a query transaction as a high watermark
> cannot ensure exactly-once semantics.
> -------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-36397
> URL: https://issues.apache.org/jira/browse/FLINK-36397
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: cdc-3.2.0
> Reporter: Zhongmin Qiao
> Assignee: Zhongmin Qiao
> Priority: Major
> Labels: pull-request-available
> Attachments: picture1.png
>
>
> !picture1.png|width=564,height=357!
> Using the offset obtained after a query transaction as a high watermark
> cannot ensure exactly-once semantics because "show master status" and the
> query action are not in the same transaction. There may be data inserted
> between the query action and the retrieval of the high watermark. As a
> result, these data will be lost since we only deliver data after the high
> watermark during the binlog phase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)