LuciferYang opened a new pull request, #364:
URL: https://github.com/apache/doris-spark-connector/pull/364
## Problem
The stream load write path is not exactly-once across a client-side retry.
When a stream load batch fails on the client side, the rows may have
*actually
committed* on the backend — for example the HTTP response is lost while the
task
thread is interrupted (the processor interrupts the task thread on an async
failure), or a concurrent DDL races the load. Because every batch (including
each
retry) is sent with a **freshly generated label**, the backend cannot
recognize
the retry as a duplicate, so the rows get written twice.
## Fix
- Reuse the previous batch's label when retrying a **failed** batch. A fresh
label
is still minted for every new (committed) batch, so normal writes are
unchanged.
- Treat a `Label Already Exists` response **on a reused label** as success:
the
original batch already committed under that label, so the retry is an
idempotent
no-op and the backend correctly rejects the duplicate.
Together these make the retry path exactly-once.
## Notes
The race is reliably reproducible under Spark 4.x (its task-thread unparking
shortens the lost-response window and makes the retry fire); on Spark
2.x/3.x the
same hole exists but is rare. It is exercised by `DorisWriterFailoverITCase`
(`testFailoverForRetry`) under the Spark 4.0 module.
## Test Plan
- Existing `DorisWriterFailoverITCase` continues to pass on 2.x/3.x.
- Validated exactly-once (no duplicate rows) via the Spark 4.0 integration
suite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]