zlzhang0122 opened a new issue, #3141:
URL: https://github.com/apache/flink-cdc/issues/3141
We met a strange problem of the Flink CDC: When repeatedly adding table
to a Flink CDC link, it may fails and report a very old gtid can't be find, we
digging the source code and found the reason is bellow:
1. When CDC full phase change to incremental phase, binlog need pull
ending offset of all chunk, and it will take the minimum of these offset as the
stating offset of the incremental phase.Ending offset of each chunk are store
in the JM.
2. If we added table repeatedly, and each time we need to suspend the
job, alter the config, and then resume form latest checkpoint.
3. Normally, when finished adding table, we pull the ending offset of
each chunk. The pull process will transfer a size between the jm and tm, which
means when there is 100 tables in jm, and we have processed 80, we need process
81 to pull the next offset.
4. There has one problem because the order of the split in jm and tm is
not the same.The jm will order by table name (such as a:0, a:1, b:0, b:1), when
added table, we need pull the ending offset of the newly added table, while jm
order the split by the table name, and the newly added table may occurs in
middle, so we may get a ending offset of a very old split.
<img width="1386" alt="1"
src="https://github.com/apache/flink-cdc/assets/5321584/f8383f59-82d9-4d97-bad7-1aea54c6ac81">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]