BenJFan commented on issue #1461: URL: https://github.com/apache/incubator-seatunnel/issues/1461#issuecomment-1073522850
> > > @BenJFan Recently, I haven't started to write code, mainly to understand how to ensure strict consistency of transactions when using Flink CDC: [Can Flink CDC guarantee MySQL transactions](https://github.com/ververica/flink-cdc-connectors/issues/956) > > > In addition, for the specific code level, I still have the above three problems to be solved. > > > > > > Guaranteed transaction is not only one component that can be completed, the transaction that cdc can guarantee requires not only that the data source can be replayed (binlog can be replayed), but also the sink side to support transactions (traditional transaction or distributed transaction) or write idempotency > > Flink CDC supports binlog replay. The problem I want to solve is that the sink side can strictly guarantee the transactions on the source side, rather than simply inserting and modifying them line by line through SQL ( it only replays SQL, but it can not guarantee transactions. For example, if a sink side transaction suddenly goes down in the middle of execution, there is a problem with the data on the sink side at this time). I think there are several key points to this problem: > > 1. The source side can obtain transaction information and ensure the order > 2. The sink side can ensure the sequential insertion of transactions and the idempotency during fault recovery > > I have some understanding of these two questions: > > 1. I found that the changelog event in debezium contains transaction information, but the transaction information in Flink's SourceRecord is not complete. I'm considering whether to improve the transaction information of Flink CDC, then construct different queues through different transaction id, and finally submit in the order of gtids? > 2. The idempotency of fault recovery is mainly reflected in ensuring that the transaction will not be executed repeatedly, so it may be necessary to introduce checkpoints to record the transaction id, which I haven't thought about yet. 1. The order of transactions is determined by the transaction id. Idempotency needs to be supported by the design of data writing methods, and has nothing to do with fault recovery. 2. CDC should already support checkpoint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
