DanielCarter-stack commented on issue #10449:
URL: https://github.com/apache/seatunnel/issues/10449#issuecomment-3851063482

   <!-- code-pr-reviewer -->
   This issue appears valid. Based on the code analysis, the root cause is in 
`SemanticXidGenerator`:
   
   1. **Static SecureRandom**: The `SECURE_RANDOM` field is `static` and shared 
across all `SemanticXidGenerator` instances. When multiple tables create 
separate `JdbcExactlyOnceSinkWriter` instances concurrently (via 
`MultiTableSinkWriter`), the `getRandomBytes()` call in `open()` may generate 
identical 4-byte `bqual` values.
   
   2. **XID collision**: XID uniqueness depends on `formatId (201)` + `gtrid 
(job_id + subtask_index + checkpoint_id)` + `bqual (4 random bytes)`. Within 
the same checkpoint, multiple sink writers share identical formatId, job_id, 
subtask_index, and checkpoint_id. If `bqual` collides, the entire XID 
duplicates, causing MySQL's `XAER_DUPID` error (-8).
   
   **Evidence files**:
   - 
`connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/xa/SemanticXidGenerator.java`
 (static `SECURE_RANDOM` and `bqualBuffer` generation)
   - 
`connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/sink/JdbcExactlyOnceSinkWriter.java`
 (creates one `XidGenerator` per instance)
   - 
`seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableSinkWriter.java`
 (maps each table to a separate `SinkWriter`)
   
   **Workarounds**:
   - Split into separate jobs (one schema per job)
   - Set `is_exactly_once: false`
   - Use different target databases
   
   **Proposed fix**: Incorporate table identifier into `bqual` generation 
instead of relying solely on random bytes to guarantee uniqueness per table 
within the same checkpoint.
   
   Could you share the output of `XA RECOVER CONVERT XID;` from MySQL when the 
error occurs? This would confirm duplicate XIDs are present in the transaction 
manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to