Sephiroth1024 opened a new issue, #10461:
URL: https://github.com/apache/seatunnel/issues/10461

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   Pre agreements:
   - Take MySQL as an example.
   - When we submit a job, seatunnel will query the current table structures in 
the target database and store them in 
`AbstractDebeziumDeserializationSchema#tableChangesStructMap`.
   Refer to it as `tableChangesStructMap`.
   - To be simplified, assume that the worker will perform the 
checkpoint(actually the master does).
   - Assume that there is a database and it has two tables, tb1 and tb2.
   
   1. Now we submit a job.
   
   <img width="544" height="572" alt="Image" 
src="https://github.com/user-attachments/assets/00d7b2bf-7787-404c-8d15-569525f1b02d";
 />
   
   2. Then a new table called tb3 is created.
   
   <img width="560" height="188" alt="Image" 
src="https://github.com/user-attachments/assets/31751204-d68d-4e36-9f13-b408bd370622";
 />
   
   3. Perform a checkpoint.
   
   <img width="1032" height="186" alt="Image" 
src="https://github.com/user-attachments/assets/83ece3ba-b838-4f5f-a475-66a4b20117d1";
 />
   
   4. The task failed for some reason and master will retry. And something is 
wrong because the master's `tableChangesStructMap` is an old version (the only 
time when master build `tableChangesStructMap` is when the job is submitted).
   
   <img width="542" height="406" alt="Image" 
src="https://github.com/user-attachments/assets/fc18adb3-7433-4e09-8936-9f75ce6e77de";
 />
   
   5. Perform a checkpoint. And the wrong table structures will be stored in 
the checkpoint file forever.
   
   <img width="1026" height="174" alt="Image" 
src="https://github.com/user-attachments/assets/d1374ee5-75ab-44e1-8d61-56b3195ee922";
 />
   
   
   It may cause some exceptions like `Can't obtain schema for table xxx`.
   
   It can also cause some debezium's exceptions like `Data row is smaller than 
a column index`. When the debezium starts, it will recover it's 
`DatabaseSchema`  from `DatabaseHistory`. The seatunnel's implementation is 
`EmbeddedDatabaseHistory`, and it will recover from the latest checkpoint.
   
   What kind of exceptions it may cause depends on what kind of DDL statements 
the database executes.
   
   ### SeaTunnel Version
   
   2.3.12
   
   ### SeaTunnel Config
   
   ```conf
   //
   ```
   
   ### Running Command
   
   ```shell
   //
   ```
   
   ### Error Exception
   
   ```log
   //
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to