Sephiroth1024 opened a new issue, #10461: URL: https://github.com/apache/seatunnel/issues/10461
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened Pre agreements: - Take MySQL as an example. - When we submit a job, seatunnel will query the current table structures in the target database and store them in `AbstractDebeziumDeserializationSchema#tableChangesStructMap`. Refer to it as `tableChangesStructMap`. - To be simplified, assume that the worker will perform the checkpoint(actually the master does). - Assume that there is a database and it has two tables, tb1 and tb2. 1. Now we submit a job. <img width="544" height="572" alt="Image" src="https://github.com/user-attachments/assets/00d7b2bf-7787-404c-8d15-569525f1b02d" /> 2. Then a new table called tb3 is created. <img width="560" height="188" alt="Image" src="https://github.com/user-attachments/assets/31751204-d68d-4e36-9f13-b408bd370622" /> 3. Perform a checkpoint. <img width="1032" height="186" alt="Image" src="https://github.com/user-attachments/assets/83ece3ba-b838-4f5f-a475-66a4b20117d1" /> 4. The task failed for some reason and master will retry. And something is wrong because the master's `tableChangesStructMap` is an old version (the only time when master build `tableChangesStructMap` is when the job is submitted). <img width="542" height="406" alt="Image" src="https://github.com/user-attachments/assets/fc18adb3-7433-4e09-8936-9f75ce6e77de" /> 5. Perform a checkpoint. And the wrong table structures will be stored in the checkpoint file forever. <img width="1026" height="174" alt="Image" src="https://github.com/user-attachments/assets/d1374ee5-75ab-44e1-8d61-56b3195ee922" /> It may cause some exceptions like `Can't obtain schema for table xxx`. It can also cause some debezium's exceptions like `Data row is smaller than a column index`. When the debezium starts, it will recover it's `DatabaseSchema` from `DatabaseHistory`. The seatunnel's implementation is `EmbeddedDatabaseHistory`, and it will recover from the latest checkpoint. What kind of exceptions it may cause depends on what kind of DDL statements the database executes. ### SeaTunnel Version 2.3.12 ### SeaTunnel Config ```conf // ``` ### Running Command ```shell // ``` ### Error Exception ```log // ``` ### Zeta or Flink or Spark Version _No response_ ### Java or Scala Version _No response_ ### Screenshots _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
