hanahmily commented on issue #13861:
URL: https://github.com/apache/skywalking/issues/13861#issuecomment-4394424605
Thanks for the clean fresh-disk experiment, @Felix-wave — that pinned down
"this is not 0.9 file residue."
Update from this side: I have a unit test that reproduces the exact panic
message verbatim by truncating `spans.bin` at a block boundary while leaving
`metadata.json`/`primary.bin` intact:
```
panic: offset 5300 must be equal to bytesRead 5247
```
Same shape as your `offset 1400877 must be equal to bytesRead 1400490`. The
on-disk state required to trigger it is what you'd see after a hard kill
(SIGKILL/OOM/eviction) mid-merge: BanyanDB's merge write path doesn't `fsync`
data files before writing `metadata.json`, so a kill between writeback and
metadata commit leaves a torn part that survives across restarts and re-trips
the panic on every wake-up of the merge loop. That matches the ~8-minute crash
cadence in your cluster — once you have one torn part, the loop perpetuates.
Audit findings + proposed fix: #13862.
The one piece I haven't been able to pin down from your report is what
triggers the *first* tear on a fresh disk — your "first occurrence ~30 min
after wipe, then ~25 min cadence" doesn't square with a self-perpetuating cycle
alone, since the cycle needs an initial torn part to start. Two quick questions
that would discriminate:
1. **Pod exit history before the first panic.** Could you run, on one of the
crashing pods:
```
kubectl get events -n <ns> --sort-by='.lastTimestamp' | grep banyandb
kubectl describe pod <banyandb-pod> | grep -A5 "Last State"
```
What I'm looking for: any `OOMKilling`, `Evicted`, `Killing`, or
`BackOff` events *before* the first panic-driven exit, and the `Reason` on the
previous termination. If we see `OOMKilled` or `Evicted`, that's the
first-cause torn write.
2. **Memory limits + utilization.** What's the BanyanDB pod's memory
request/limit, and do you have a memory-utilization graph for the first 30-60
minutes of a fresh-disk run? OAP backfill into a fresh BanyanDB hits the merge
loop hard right after schema creation; if the pod is brushing the limit there,
an OOMKill is the most plausible first trigger.
Independent of what turns out to be the first cause, the durability +
read-side fixes proposed in #13862 will stop the perpetuation: once they land,
even a torn write from an OOMKill won't take down the merger — the affected
part is quarantined and the loop continues.
Happy to share the reproducer test (it's a single self-contained Go test
against `tst.mergeParts`) if you'd like to run it locally to convince yourself
the panic shape matches.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]