[
https://issues.apache.org/jira/browse/FLINK-37848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956239#comment-17956239
]
Darcy Lin commented on FLINK-37848:
-----------------------------------
We can add a new startup mode called "timestamp-with-snapshot". When starting
in a stateless manner, it will behave similarly to the "initial" startup mode,
generating a SplitAssigner of type MySqlHybridSplitAssigner and setting its
snapshot state to already completed, ready to consume binlog from the
configured timestamp. This way, when new tables need to be added, since the
state type generated by MySqlHybridSplitAssigner is HybridPendingSplitsState, a
stateful initial restart will still generate a SplitAssigner of type
MySqlHybridSplitAssigner and perform snapshot synchronization of historical
data for the newly added tables based on the configuration.
> After timestamp mode is enabled, newly added tables will not have all their
> data synchronized.
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-37848
> URL: https://issues.apache.org/jira/browse/FLINK-37848
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: cdc-3.4.0
> Reporter: Darcy Lin
> Priority: Major
>
> If a synchronization task is started using a timestamp, then even if
> scan.startup.mode is set to initial, only a MysqlBinlogSplitAssigner-type
> SplitAssigner can be generated during state recovery, and a
> MysqlHybridSplitAssigner-type SplitAssigner can never be generated. The logic
> for snapshotting and binlog processing of newly added tables in the CDC
> configuration file exists only in MysqlHybridSplitAssigner. As a result, if a
> task is started with a stateless timestamp, there is no way to perform
> snapshot synchronization of historical data for newly added tables, leading
> to data inconsistency in the newly added tables.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)