hanahmily commented on issue #13861:
URL: https://github.com/apache/skywalking/issues/13861#issuecomment-4394424605

   Thanks for the clean fresh-disk experiment, @Felix-wave — that pinned down 
"this is not 0.9 file residue."
   
   Update from this side: I have a unit test that reproduces the exact panic 
message verbatim by truncating `spans.bin` at a block boundary while leaving 
`metadata.json`/`primary.bin` intact:
   
   ```
   panic: offset 5300 must be equal to bytesRead 5247
   ```
   
   Same shape as your `offset 1400877 must be equal to bytesRead 1400490`. The 
on-disk state required to trigger it is what you'd see after a hard kill 
(SIGKILL/OOM/eviction) mid-merge: BanyanDB's merge write path doesn't `fsync` 
data files before writing `metadata.json`, so a kill between writeback and 
metadata commit leaves a torn part that survives across restarts and re-trips 
the panic on every wake-up of the merge loop. That matches the ~8-minute crash 
cadence in your cluster — once you have one torn part, the loop perpetuates.
   
   Audit findings + proposed fix: #13862.
   
   The one piece I haven't been able to pin down from your report is what 
triggers the *first* tear on a fresh disk — your "first occurrence ~30 min 
after wipe, then ~25 min cadence" doesn't square with a self-perpetuating cycle 
alone, since the cycle needs an initial torn part to start. Two quick questions 
that would discriminate:
   
   1. **Pod exit history before the first panic.** Could you run, on one of the 
crashing pods:
      ```
      kubectl get events -n <ns> --sort-by='.lastTimestamp' | grep banyandb
      kubectl describe pod <banyandb-pod> | grep -A5 "Last State"
      ```
      What I'm looking for: any `OOMKilling`, `Evicted`, `Killing`, or 
`BackOff` events *before* the first panic-driven exit, and the `Reason` on the 
previous termination. If we see `OOMKilled` or `Evicted`, that's the 
first-cause torn write.
   
   2. **Memory limits + utilization.** What's the BanyanDB pod's memory 
request/limit, and do you have a memory-utilization graph for the first 30-60 
minutes of a fresh-disk run? OAP backfill into a fresh BanyanDB hits the merge 
loop hard right after schema creation; if the pod is brushing the limit there, 
an OOMKill is the most plausible first trigger.
   
   Independent of what turns out to be the first cause, the durability + 
read-side fixes proposed in #13862 will stop the perpetuation: once they land, 
even a torn write from an OOMKill won't take down the merger — the affected 
part is quarantined and the loop continues.
   
   Happy to share the reproducer test (it's a single self-contained Go test 
against `tst.mergeParts`) if you'd like to run it locally to convince yourself 
the panic shape matches.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to