Felix-wave opened a new issue, #13861: URL: https://github.com/apache/skywalking/issues/13861
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar issues. (Different from the related #13860 — that one is in the trace **write** path and is recovered by the gRPC interceptor; this one is in the trace **merge/read** path and crashes the process.) ### Apache SkyWalking Component BanyanDB ### What happened After upgrading from `apache/skywalking-banyandb:0.9.0` to `0.10.1` (with OAP 10.4.0), BanyanDB **crashes the process** every ~7-8 minutes with: ``` panic: offset 1400877 must be equal to bytesRead 1400490 ``` Unlike the timestamp-ordering panic in #13860 (which is recovered by `grpc-middleware`), this one fires from a **background `mergeLoop` goroutine** that is not wrapped by recovery, so the process exits and the pod restarts. #### Full stack ``` goroutine 3900 [running]: github.com/apache/skywalking-banyandb/pkg/logger.Panicf(...) github.com/apache/skywalking-banyandb/banyand/trace.(*partMergeIter).mustReadRaw(0xc001ac4000, 0xc002d716b8, 0xc001ac4118) /mnt/d/skywalking-banyandb/banyand/trace/part_iter.go:359 +0xf5 github.com/apache/skywalking-banyandb/banyand/trace.(*blockReader).mustReadRaw(...) /mnt/d/skywalking-banyandb/banyand/trace/block_reader.go:263 github.com/apache/skywalking-banyandb/banyand/trace.mergeBlocks(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:421 +0x79e github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeParts(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:344 +0x42a github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergePartsThenSendIntroduction(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:118 +0x145 github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeSnapshot(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:104 +0x125 github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop.func1(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:78 +0x1f9 github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop(...) /mnt/d/skywalking-banyandb/banyand/trace/merger.go:90 +0x271 created by github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).startLoop in goroutine 157 /mnt/d/skywalking-banyandb/banyand/trace/tstable.go:130 +0x246 ``` #### Source location (apache/skywalking-banyandb v0.10.1) `banyand/trace/part_iter.go:354-365`: ```go func (pmi *partMergeIter) mustReadRaw(r *rawBlock, bm *blockMetadata) { r.bm = bm // spans if bm.spans != nil && bm.spans.size > 0 { // Validate the reader is aligned to the expected offset if bm.spans.offset != pmi.seqReaders.spans.bytesRead { logger.Panicf("offset %d must be equal to bytesRead %d", bm.spans.offset, pmi.seqReaders.spans.bytesRead) } ... } ... } ``` So the merger sequentially reads spans from `seqReaders.spans`, and a per-block `bm.spans.offset` is expected to match how far the `seqReader` has advanced (`bytesRead`). When they diverge — by 387 bytes in our sample — the merger panics. The same pattern (`offset must be equal to bytesRead`) appears at: - `banyand/trace/block.go:196` (tag metadata) - `banyand/trace/block.go:329` (span data) - `banyand/internal/sidx/block.go` - `banyand/measure/block.go` - `banyand/stream/block.go` So the invariant is repeated across the new (0.10) trace storage engine. ### Cadence and impact In our cluster, BanyanDB pod restarted **126 times in 17 hours** = roughly once every 8 minutes. Every time, OAP loses connection to BanyanDB and hot-loops crash too (~148 OAP restarts in the same window). Net effect: rolling availability — every ~8 minutes there is a 1-2 minute window where ingestion and queries fail. For comparison, on 0.9.0 the only panic we saw fired ~once every 28 minutes. **0.10.1 is significantly less stable on our workload, primarily because of this new panic in the merger.** ### What you expected to happen The merger should not panic on what is clearly a corrupted or out-of-sync block metadata. Reasonable options (maintainers know best): 1. **Skip the offending block** with a warning instead of `Panicf` — at minimum, contain the blast radius to one block instead of restarting the whole DB. 2. **Restart the seqReader** to the offset declared in `bm.spans.offset` (or vice versa) when divergence is detected — assumes the metadata is the source of truth. 3. **Fail the merge of the affected part** but keep the process running and let retention/cleanup eventually drop the corrupted part. ### How to reproduce Steady-state SkyWalking deployment, OAP forwarding traces to standalone BanyanDB. We see this on: - BanyanDB: `apache/skywalking-banyandb:0.10.1` - SkyWalking OAP: `apache/skywalking-oap-server:10.4.0` - ~30+ Java services, `apache-skywalking-java-agent` 9.5.0, JDK 21 - Standalone BanyanDB on Kubernetes (Aliyun ACK), `--trace-root-path=/data/trace` - 51 GB cumulative on disk after 17h of ingest (stream 38.5G + trace 12.9G + measure 26M) The very first occurrence happens within ~30 minutes of starting fresh (after fully wiping `/data` and letting OAP recreate schemas). After that, panic cadence stabilizes at ~8 minutes. ### Anything else This bug is in the new trace storage engine introduced by #713 in 0.10.0; we did not see this panic on 0.9.0 (which uses the older trace path). We have already reported the related — but distinct — timestamp-ordering panic in the **write** path as #13860 (recoverable, not crashing the process). Filing this one separately because the failure mode (background merge goroutine, no recovery, full process exit) is different and arguably more disruptive. Happy to gather more samples (full stack traces over time, sample part dumps if a tool exists, sysrq dumps, anything) on request. ### Are you willing to submit a pull request to fix on your own - [ ] Yes, I am willing to submit a pull request on my own! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
