wuhainan commented on issue #11013: URL: https://github.com/apache/seatunnel/issues/11013#issuecomment-4645128312
> Thanks for collecting the cleaner reproduction data — this is much stronger evidence. > > The new logs make the direction here considerably narrower. In the current reader path, entering streaming mode is not by itself proof that the initial snapshot was fully materialized downstream. `resolvedTs` can continue to move forward once the CDC client starts, while actual row emission still depends on the PREWRITE/COMMIT matching and flush path. That means the symptom you captured is consistent with a real reader-side bug, not just a metrics illusion. > > The most important part of your new report is this combination: > > * one source split > * `startup.mode = initial` > * snapshot starts > * streaming begins about 18 seconds later > * downstream row count stalls at `93,354` while the source table had `522,625` rows > * `resolvedTs` and checkpoints keep advancing afterward > > That strongly suggests the problem is in the snapshot-to-stream handoff family, rather than a normal or expected transition. In other words, this now looks much closer to either: > > 1. the snapshot phase exiting before the split was fully materialized, or > 2. the reader entering streaming with incomplete materialization state and then continuing to advance `resolvedTs`. > > So this issue is worth keeping separate and open. It is related to `#8815`, but the new evidence here is much more specific and should help us drive a more targeted fix. > > The next high-value step would be a deterministic regression test around the single-split `INITIAL` path, especially validating that the reader does not enter effective streaming progress before the snapshot for that split is fully drained downstream. @DanielLeens Thank you for the detailed analysis and confirmation. I will keep the reproduction job logs and metrics available. If a deterministic regression test or a fix PR needs more runtime evidence, I can provide the full JobManager / TaskManager logs and the source/target count SQL results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
