BenJFan commented on issue #1461:
URL: 
https://github.com/apache/incubator-seatunnel/issues/1461#issuecomment-1073522850


   > > > @BenJFan Recently, I haven't started to write code, mainly to 
understand how to ensure strict consistency of transactions when using Flink 
CDC: [Can Flink CDC guarantee MySQL 
transactions](https://github.com/ververica/flink-cdc-connectors/issues/956)
   > > > In addition, for the specific code level, I still have the above three 
problems to be solved.
   > > 
   > > 
   > > Guaranteed transaction is not only one component that can be completed, 
the transaction that cdc can guarantee requires not only that the data source 
can be replayed (binlog can be replayed), but also the sink side to support 
transactions (traditional transaction or distributed transaction) or write 
idempotency
   > 
   > Flink CDC supports binlog replay. The problem I want to solve is that the 
sink side can strictly guarantee the transactions on the source side, rather 
than simply inserting and modifying them line by line through SQL ( it only 
replays SQL, but it can not guarantee transactions. For example, if a sink side 
transaction suddenly goes down in the middle of execution, there is a problem 
with the data on the sink side at this time). I think there are several key 
points to this problem:
   > 
   > 1. The source side can obtain transaction information and ensure the order
   > 2. The sink side can ensure the sequential insertion of transactions and 
the idempotency during fault recovery
   > 
   > I have some understanding of these two questions:
   > 
   > 1. I found that the changelog event in debezium contains transaction 
information, but the transaction information in Flink's SourceRecord is not 
complete. I'm considering whether to improve the transaction information of 
Flink CDC, then construct different queues through different transaction id, 
and finally submit in the order of gtids?
   > 2. The idempotency of fault recovery is mainly reflected in ensuring that 
the transaction will not be executed repeatedly, so it may be necessary to 
introduce checkpoints to record the transaction id, which I haven't thought 
about yet.
   
   1. The order of transactions is determined by the transaction id. 
Idempotency needs to be supported by the design of data writing methods, and 
has nothing to do with fault recovery. 
   2. CDC should already support checkpoint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to