[PR] Fix migration failure when row replay from large data [skywalking-banyandb]

via GitHub Wed, 03 Jun 2026 21:31:59 -0700


mrproliu opened a new pull request, #1154:
URL: https://github.com/apache/skywalking-banyandb/pull/1154


   ## Why
   
     Tiered-storage lifecycle migration falls back to **row-replay** 
(re-publishing each row through the Write API) for any source part whose 
segment spans more than one target segment — the case for non-multiple/coprime 
stage intervals (e.g.
     `sw_metricsHour` hot **5d** → warm **7d**, where a `[05-27, 06-01)` 
segment straddles the warm `05-28` boundary).
   
     For large parts this failed with:
   
     ```
     code = DeadlineExceeded desc = context deadline exceeded
     ... confirm row-replay measure part .../seg-20260527/shard-0/...: 1 node 
error(s)
     file-based measure migration failed ... measure parts incomplete
     ```
   
     **Root cause:** row-replay sent the *entire part* on a single 
client-streaming batch publisher. That stream's context deadline (the `30s` 
batch timeout) was set once at stream-open and had to cover the whole part's 
send. A real part of
     **425,940 rows** takes ~42s to build/route/marshal/publish, so the single 
30s window expired mid-part → the receiver returned `DeadlineExceeded` → the 
part was marked incomplete and the group migration "partially completed", 
retrying
     with the same failure every cycle.
   
     ## Solution
   
     Reworked row-replay so the timeout is **per batch**, not per part, and 
extracted the machinery into one shared layer:
   
     - **Per-batch timeout window** — each 2000-row batch is published on its 
*own* publisher with a fresh timeout and rotated in immediately. No single 
deadline ever spans the whole part, so arbitrarily large parts complete.
     - **Bounded confirm pipeline (depth 2)** — a batch's receiver-side 
confirmation overlaps with building/sending the next batch (double-buffering), 
cutting wall time while capping in-flight memory.
     - **Shared `batchSender` + `confirmPipeline`** — the per-type replayers 
(measure/stream/trace) now just build messages and enqueue; the publisher 
lifecycle (open/rotate/close), bounded overlap, and confirmation ordering live 
in one place
     (removes ~350 lines of triplicated logic).
     - **Standard `error` instead of a custom outcome struct** — replayers 
return a plain `error` (`*nodeReplayError`) that still distinguishes per-node 
delivery failures (`cee`) from global send/build failures.
     - **Per-part timing instrumentation** — each part logs `build_send` vs 
`confirm_wait` to pinpoint sender- vs receiver-bound migrations.
   
     On abort, in-flight batches are still drained and the un-flushed residual 
is discarded, so a failed part is re-replayed whole on resume (no 
duplicate/partial commits).
   
     ## Validation
   
     Restored real production data and ran the migration end-to-end:
   
     - `sw_metricsHour` hot→warm: **23 parts row-replayed successfully, 0 
`DeadlineExceeded`**, incl. part `5820` (**425,940 rows, ~41.8s**) and part 
`17658` (**484,798 rows, ~61.5s**) — both far beyond the old 30s limit.
     - Timing showed `build_send ≈ 41.7s` vs `confirm_wait ≈ 0.1s`, i.e. 
sender-bound (the pipeline keeps the receiver from being the bottleneck).
   
     ## Tests
   
     Added unit tests covering the pipeline contract: in-flight depth bound, 
per-batch publisher/timeout rotation, every-row-sent across batch boundaries + 
tail, abort-drains-and-discards (build error / iterator error / node error),
     partial-failure reporting, and the `error` type's message/unwrap behavior. 
`make lint` and the package `-race` suite are green.
   
   
   - [ ] If this pull request closes/resolves/fixes an existing issue, replace 
the issue number. Fixes apache/skywalking#<issue number>.
   - [ ] Update the [`CHANGES` 
log](https://github.com/apache/skywalking-banyandb/blob/main/CHANGES.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Fix migration failure when row replay from large data [skywalking-banyandb]

Reply via email to