wuhainan commented on issue #11013: URL: https://github.com/apache/seatunnel/issues/11013#issuecomment-4637008473
> I traced the current TiDB CDC reader path, and this does look like a real source-side bug worth tracking, not just a sink-side stall. > > In the current implementation, `TiDBSourceReader.captureStreamingEvents(...)` always advances the split's `resolvedTs` from `cdcClient.getMaxResolvedTs()`, but actual downstream emission still depends on the `preWrites` / `commits` matching path and `flushRows(resolvedTs)`. That means the symptom you reported is plausible in the current code: checkpoints can keep succeeding and `resolvedTs` can keep moving forward even while some row changes are no longer materialized downstream. > > This also looks related to [#8815](https://github.com/apache/seatunnel/issues/8815), but not identical. [#8815](https://github.com/apache/seatunnel/issues/8815) was centered on older TiDB-CDC-MIGRATE loss reports, while your reproduction is on 2.3.13 + Flink + normal TiDB-CDC and already narrows the symptom much better. > > We should keep this open as a real bug. We have labeled it as `help wanted` since the reporter did not opt into a PR. A good next debugging slice would be to verify whether the loss happens around the `PREWRITE` / `COMMIT` matching path during region movement or client pull gaps, because that is the part that can let `resolvedTs` advance independently from emitted rows. > > If someone wants to work on a fix, a small first PR with a deterministic regression test around this reader path would be very valuable. Thanks for the confirmation. I would like to add one more observation: this job was configured with `startup.mode = "initial"`, but it looks like the reader switched to streaming before the initial snapshot was fully completed. The source table had about 521,960 rows, while the target table only had about 91,746 rows when downstream emission stopped. The Flink source/sink metrics were also around 91,746 records. So it seems that only part of the initial snapshot was emitted before the reader entered the incremental CDC phase. I am not sure whether this is the same root cause as the `resolvedTs` advancing issue, but it may be related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
