Felix-wave commented on issue #13861: URL: https://github.com/apache/skywalking/issues/13861#issuecomment-4394091981
Hi @wu-sheng — this is real production traffic, not test data. Trace payloads contain customer data (full SQL bodies, HTTP request URLs, internal service/endpoint names, infrastructure identifiers), so I'm afraid we can't share the raw `/data` directory or BanyanDB part folders directly. Happy to collaborate in any other way that helps you reproduce. Some options I can offer, ordered roughly by what should be easiest: 1. **Workload profile, in detail.** I can describe the trace ingestion shape (services, agents, average trace depth, tag distributions, peak trace/sec, segment size distribution) so you can drive a synthetic generator to a comparable load. We're in the millions of segments/day range across ~30 Java services on agent 9.5.0. 2. **Redacted / anonymized dumps.** If there's an existing tool in BanyanDB (or one you can point me to / send me a snippet for) that dumps **only structural metadata of a part** (offsets, sizes, tag-family headers, block boundaries — *no* tag values, no span bytes), I can run it and share the output. The bug seems to be in offset/bytesRead alignment, so structural metadata may already be sufficient to diagnose. 3. **Run an instrumented build for you.** If you publish a debug image/binary with extra logging around `block_writer`/`mergeBlocks`/`partMergeIter.mustReadRaw` — e.g., logging the offsets, bytesRead, traceID, and the blockMetadata being written / read just before each panic — I can run it on KL Prod (high traffic) until we capture a panic and then share the redacted log snippet. We're already seeing this every ~25 min on a fresh disk, so iteration is fast. 4. **Live triage session.** I can pull arbitrary state from the cluster on demand: pod logs, banyandb HTTP API responses, `du` / `find` output of `/data`, schema dumps, OAP startup logs, etc. Tell me what you'd like to see and I'll capture it. 5. **Synthetic reproducer.** If we can pin down what's special in our traffic (e.g., particular tag shape, specific segment patterns), I can build a small load generator that drives the same pattern against a sandbox BanyanDB and share that. Which of these would be the most useful starting point? Option 3 (an instrumented build) feels highest-signal given the failure mode is clearly inside the trace storage engine's offset bookkeeping, but I'll defer to whatever you think is most efficient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
