github-actions[bot] commented on code in PR #63833:
URL: https://github.com/apache/doris/pull/63833#discussion_r3317856339
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java:
##########
@@ -276,6 +276,8 @@ private void checkRequiredSourceProperties() {
if (!sourceProperties.containsKey(DataSourceConfigKeys.OFFSET)) {
sourceProperties.put(DataSourceConfigKeys.OFFSET,
DataSourceConfigKeys.OFFSET_LATEST);
}
+ // from-to is at-least-once; default-skip in-snapshot backfill.
+
sourceProperties.putIfAbsent(DataSourceConfigKeys.SKIP_SNAPSHOT_BACKFILL,
"true");
Review Comment:
Defaulting from-to jobs to `skip_snapshot_backfill=true` can lose source
changes that commit while a snapshot split is being read. With this flag,
`IncrementalSourceScanFetcher.pollWithoutBuffer()` returns the snapshot stream
without the buffered backfill reconciliation, and the existing
snapshot-to-binlog handoff still stores each finished split's high watermark
(`JdbcSourceOffsetProvider.updateOffset`) and creates the binlog split from the
minimum finished high watermark (`MySqlSourceReader.createBinlogSplit`, with
the same pattern for JDBC/Postgres). For a single split, an
update/delete/insert between that split's low and high watermark is neither
emitted by the snapshot backfill nor replayed by the subsequent binlog split
starting at the high watermark. Please either keep the default as `false`, or
change the handoff so skipped-backfill snapshots resume binlog from the
corresponding low watermark/earlier offset and rely on idempotent replays. The
existing concurrent-DML snapsh
ot tests also need to run under the new default to cover this path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]