zlzhang0122 opened a new issue, #3141:
URL: https://github.com/apache/flink-cdc/issues/3141

        We met a strange problem of the Flink CDC: When repeatedly adding table 
to a Flink CDC link, it may fails and report a very old gtid can't be find, we 
digging the source code and found the reason is bellow:
        1. When CDC full phase change to incremental phase, binlog need pull 
ending offset of all chunk, and it will take the minimum of these offset as the 
stating offset of the incremental phase.Ending offset of each chunk are store 
in the JM.
   
        2. If we added table repeatedly, and each time we need to suspend the 
job, alter the config, and then resume form latest checkpoint.
   
        3. Normally, when finished adding table, we pull the ending offset of 
each chunk. The pull process will transfer a size between the jm and tm, which 
means when there is 100 tables in jm, and we have processed 80, we need process 
81 to pull the next offset.
   
        4. There has one problem because the order of the split in jm and tm is 
not the same.The jm will order by table name (such as a:0, a:1, b:0, b:1), when 
added table, we need pull the ending offset of the newly added table, while jm 
order the split by the table name, and the newly added table may occurs in 
middle, so we may get a ending offset of a very old split.
   <img width="1386" alt="1" 
src="https://github.com/apache/flink-cdc/assets/5321584/f8383f59-82d9-4d97-bad7-1aea54c6ac81";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to