Re: [I] [Bug] BanyanDB 0.10.1 trace merge: "offset must be equal to bytesRead" panic in part_iter, crashes the process [skywalking]

via GitHub Wed, 06 May 2026 21:12:11 -0700


Felix-wave commented on issue #13861:
URL: https://github.com/apache/skywalking/issues/13861#issuecomment-4394091981


   Hi @wu-sheng — this is real production traffic, not test data. Trace 
payloads contain customer data (full SQL bodies, HTTP request URLs, internal 
service/endpoint names, infrastructure identifiers), so I'm afraid we can't 
share the raw `/data` directory or BanyanDB part folders directly.
   
   Happy to collaborate in any other way that helps you reproduce. Some options 
I can offer, ordered roughly by what should be easiest:
   
   1. **Workload profile, in detail.** I can describe the trace ingestion shape 
(services, agents, average trace depth, tag distributions, peak trace/sec, 
segment size distribution) so you can drive a synthetic generator to a 
comparable load. We're in the millions of segments/day range across ~30 Java 
services on agent 9.5.0.
   
   2. **Redacted / anonymized dumps.** If there's an existing tool in BanyanDB 
(or one you can point me to / send me a snippet for) that dumps **only 
structural metadata of a part** (offsets, sizes, tag-family headers, block 
boundaries — *no* tag values, no span bytes), I can run it and share the 
output. The bug seems to be in offset/bytesRead alignment, so structural 
metadata may already be sufficient to diagnose.
   
   3. **Run an instrumented build for you.** If you publish a debug 
image/binary with extra logging around 
`block_writer`/`mergeBlocks`/`partMergeIter.mustReadRaw` — e.g., logging the 
offsets, bytesRead, traceID, and the blockMetadata being written / read just 
before each panic — I can run it on KL Prod (high traffic) until we capture a 
panic and then share the redacted log snippet. We're already seeing this every 
~25 min on a fresh disk, so iteration is fast.
   
   4. **Live triage session.** I can pull arbitrary state from the cluster on 
demand: pod logs, banyandb HTTP API responses, `du` / `find` output of `/data`, 
schema dumps, OAP startup logs, etc. Tell me what you'd like to see and I'll 
capture it.
   
   5. **Synthetic reproducer.** If we can pin down what's special in our 
traffic (e.g., particular tag shape, specific segment patterns), I can build a 
small load generator that drives the same pattern against a sandbox BanyanDB 
and share that.
   
   Which of these would be the most useful starting point? Option 3 (an 
instrumented build) feels highest-signal given the failure mode is clearly 
inside the trace storage engine's offset bookkeeping, but I'll defer to 
whatever you think is most efficient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] BanyanDB 0.10.1 trace merge: "offset must be equal to bytesRead" panic in part_iter, crashes the process [skywalking]

Reply via email to