github-actions[bot] commented on code in PR #63079:
URL: https://github.com/apache/doris/pull/63079#discussion_r3216673920
##########
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java:
##########
@@ -312,6 +308,12 @@ public void updateOffset(Offset offset) {
*/
@Override
public void replayIfNeed(StreamingInsertJob job) throws JobException {
+ // Re-init transient split progress fields lost across FE restart.
+ // syncTables itself is persisted on StreamingInsertJob;
cdcSplitProgress is rebuilt empty
+ // here and advanceSplits will resume from the system table on next
tick.
+ if (cdcSplitProgress == null) {
+ initSplitProgress(job.getSyncTables());
Review Comment:
This restart guard only runs when `cdcSplitProgress` is null, but the no-arg
constructor already initializes it in the base class. After an FE restart for a
`cdc_stream` TVF job, `replayIfNeed()` runs before `ensureInitialized()`, so
`cachedSyncTables` remains null. The restored `remainingSplits` from
`streaming_job_meta` can still be consumed, but once those are committed,
`noMoreSplits()` treats the null table cache as empty and
`advanceSplitsIfNeed()` never fetches the next batch. A mid-snapshot restart
therefore truncates the snapshot to only the chunks already present in the meta
table. Please always restore the transient split progress/table cache here, or
at least call `initSplitProgress(job.getSyncTables())` when `cachedSyncTables`
is null as well, and add a TVF restart test that consumes restored chunks then
verifies another batch is fetched.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]