onestardao commented on issue #13752:
URL: https://github.com/apache/iceberg/issues/13752#issuecomment-3177564004
this is a classic **No.11 – “Symbolic Collapse”** pattern: the source offset
moves forward while the sink (Iceberg/Glue) commit fails. after a Glue
exception during commit, the task restarts and those records aren’t replayed →
they look “dropped.” many Kafka Connect sinks hit this when the commit path
isn’t atomic with offset advancement.
quick things to check / fix:
1. **Block offset commit on failure**
* In the sink task, make sure `flush()`/`preCommit()` returns an **empty
map** if the last write/commit didn’t succeed. If `preCommit()` still returns
offsets after a Glue error, Connect will commit them.
2. **Two-phase write discipline**
* Stage files → commit to Iceberg **and** update Glue → only then expose
offsets. Any exception must keep offsets unadvanced so the batch replays after
restart.
3. **Retries & idempotence**
* Add bounded retries/backoff around Glue catalog ops; make the write
idempotent so a replayed batch won’t double-apply (e.g., deterministic file
paths / transaction tokens).
4. **DLQ instead of silent loss**
* Set `errors.tolerance=all` with a DLQ topic for irrecoverable records,
so failures are visible rather than disappearing.
5. **Minimal repro to prove it**
* Single partition, tiny batch, `offset.flush.interval.ms` small. Inject
a Glue exception right after files are written but before the catalog commit.
If offsets advance anyway, the bug is in the task’s `flush/preCommit` logic.
i have a short step-by-step note set for this exact pattern (test harness +
expected `preCommit` behavior and a safe commit sequence). if you want, i can
share the details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]