DanielCarter-stack commented on issue #10461:
URL: https://github.com/apache/seatunnel/issues/10461#issuecomment-3859117935

   <!-- code-pr-reviewer -->
   Thanks for reporting this potential bug. The analysis suggests a real issue: 
when a Zeta CDC job restores from checkpoint after DDL changes occurred during 
execution, the master node's `tableChangesStructMap` in 
`AbstractDebeziumDeserializationSchema` may retain stale schema data, while 
workers have the updated map. This can cause `Can't obtain schema for table 
xxx` or `Data row is smaller than a column index` errors.
   
   **Key evidence:**
   - `tableChangesStructMap` is final and only initialized in constructor 
(`AbstractDebeziumDeserializationSchema.java:48-58`)
   - Workers dynamically add new schemas via `deserialize()` but master doesn't 
process data streams
   - Checkpoint stores worker's `historyTableChanges` 
(`IncrementalSourceReader.java:267-282`) but `restoreCheckpointProducedType` 
doesn't restore this map (`SeaTunnelRowDebeziumDeserializeSchema.java:295-324`)
   
   **To help verify:**
   1. Are you using Zeta engine with `exactly_once=true`?
   2. Did the DDL change occur between checkpoints before failure?
   3. Can you share the full exception stack trace?
   
   This would be a valuable contribution if confirmed. A possible fix could 
involve restoring `tableChangesStructMap` in the checkpoint recovery logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to