LuciferYang opened a new pull request, #364:
URL: https://github.com/apache/doris-spark-connector/pull/364

   ## Problem
   
   The stream load write path is not exactly-once across a client-side retry.
   
   When a stream load batch fails on the client side, the rows may have 
*actually
   committed* on the backend — for example the HTTP response is lost while the 
task
   thread is interrupted (the processor interrupts the task thread on an async
   failure), or a concurrent DDL races the load. Because every batch (including 
each
   retry) is sent with a **freshly generated label**, the backend cannot 
recognize
   the retry as a duplicate, so the rows get written twice.
   
   ## Fix
   
   - Reuse the previous batch's label when retrying a **failed** batch. A fresh 
label
     is still minted for every new (committed) batch, so normal writes are 
unchanged.
   - Treat a `Label Already Exists` response **on a reused label** as success: 
the
     original batch already committed under that label, so the retry is an 
idempotent
     no-op and the backend correctly rejects the duplicate.
   
   Together these make the retry path exactly-once.
   
   ## Notes
   
   The race is reliably reproducible under Spark 4.x (its task-thread unparking
   shortens the lost-response window and makes the retry fire); on Spark 
2.x/3.x the
   same hole exists but is rare. It is exercised by `DorisWriterFailoverITCase`
   (`testFailoverForRetry`) under the Spark 4.0 module.
   
   ## Test Plan
   
   - Existing `DorisWriterFailoverITCase` continues to pass on 2.x/3.x.
   - Validated exactly-once (no duplicate rows) via the Spark 4.0 integration 
suite.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to