DanielLeens commented on issue #11013: URL: https://github.com/apache/seatunnel/issues/11013#issuecomment-4636756956
I traced the current TiDB CDC reader path, and this does look like a real source-side bug worth tracking, not just a sink-side stall. In the current implementation, `TiDBSourceReader.captureStreamingEvents(...)` always advances the split's `resolvedTs` from `cdcClient.getMaxResolvedTs()`, but actual downstream emission still depends on the `preWrites` / `commits` matching path and `flushRows(resolvedTs)`. That means the symptom you reported is plausible in the current code: checkpoints can keep succeeding and `resolvedTs` can keep moving forward even while some row changes are no longer materialized downstream. This also looks related to #8815, but not identical. #8815 was centered on older TiDB-CDC-MIGRATE loss reports, while your reproduction is on 2.3.13 + Flink + normal TiDB-CDC and already narrows the symptom much better. We should keep this open as a real bug. We have labeled it as `help wanted` since the reporter did not opt into a PR. A good next debugging slice would be to verify whether the loss happens around the `PREWRITE` / `COMMIT` matching path during region movement or client pull gaps, because that is the part that can let `resolvedTs` advance independently from emitted rows. If someone wants to work on a fix, a small first PR with a deterministic regression test around this reader path would be very valuable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
