JNSimba opened a new pull request, #63833:
URL: https://github.com/apache/doris/pull/63833
## Summary
- Default-skip flink-cdc's in-snapshot backfill on the from-to path so large
splits no longer accumulate the entire chunk + backfill stream in the fetcher's
outputBuffer; from-to is at-least-once and tolerates the duplicates this
introduces. TVF (job-driven and standalone) keeps the standard `false` default
for exactly-once via per-task offset commit.
- Expose `skip_snapshot_backfill` as a user-facing property with strict
`true`/`false` validation on both from-to (CREATE JOB) and TVF (SELECT FROM
cdc_stream(...)) entry points.
- Fix snapshot completion under `pollWithoutBuffer`: a split is now marked
complete only after its high-watermark event has been consumed
(`splitState.getHighWatermark() != null`), not on the first non-empty fetcher
batch. Without this, enabling the new default truncates any split larger than
debezium's `max.batch.size` and yields an NPE on offset extraction.
- Read `streaming_task_timeout_multiplier` live in
`StreamingMultiTblTask.isTimeout()` so `admin set frontend config` affects
already-running tasks, matching the `@ConfField(mutable=true)` contract.
## Test plan
- [ ] \`mvn compile\` passes for \`fe-core\` and \`cdc_client\`
- [ ] New \`test_streaming_postgres_job_snapshot_fat_split\` /
\`test_streaming_mysql_job_snapshot_fat_split\` pass: 2100 rows with
\`snapshot_split_size=3000\` (single split exceeds \`max.batch.size=2048\`),
asserting count=2100, distinct=2100, \`id BETWEEN 2049 AND 2100\`=52, and
post-snapshot DML still flows
- [ ] Existing \`test_streaming_*_id_gap_completeness\` /
\`test_streaming_*_snapshot\` / \`test_streaming_*_async_split*\` regressions
still pass
- [ ] Validator rejects \`skip_snapshot_backfill=foo\` at SQL analysis on
both CREATE JOB and \`cdc_stream\` TVF
- [ ] \`admin set frontend config
("streaming_task_timeout_multiplier"="N")\` while a from-to task is running
takes effect on the running task
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]