Felix-wave opened a new issue, #13860:
URL: https://github.com/apache/skywalking/issues/13860

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar issues. (No matches for the panic message in either `apache/skywalking` 
or `apache/skywalking-banyandb`.)
   
   ### Apache SkyWalking Component
   
   BanyanDB
   
   ### What happened
   
   `banyand/trace/block_writer.go` panics when blocks for the same `traceID` 
arrive with `tm.min < minTimestampLast`. On a real SkyWalking OAP 10.4.0 
production workload this fires roughly once per minute, with each panic 
discarding one trace-write batch.
   
   The offending check (`apache/skywalking-banyandb` master branch):
   
   
https://github.com/apache/skywalking-banyandb/blob/master/banyand/trace/block_writer.go#L259-L262
   
   ```go
   if isSeenTid && tm.min < bw.minTimestampLast {
       logger.Panicf("the block for tid=%s cannot contain timestamp smaller 
than %d, but it contains timestamp %d", tid, bw.minTimestampLast, tm.min)
   }
   ```
   
   The same pattern exists at line 320 of the same file and in 
`banyand/{stream,measure}/block_writer.go`.
   
   #### Sample panics from a production cluster (banyandb 0.10.1, OAP 10.4.0)
   
   ```
   the block for tid=6359c73a002c425785500f958cdc4007.661.17779941290647127 
cannot contain timestamp smaller than 1777994129833000000, but it contains 
timestamp 1777994129064000000
   the block for tid=6359c73a002c425785500f958cdc4007.661.17779941958867471 
cannot contain timestamp smaller than 1777994196635000000, but it contains 
timestamp 1777994196633000000
   the block for tid=6359c73a002c425785500f958cdc4007.661.17779944351678775 
cannot contain timestamp smaller than 1777994435391000000, but it contains 
timestamp 1777994435191000000
   ```
   
   Skew between `minTimestampLast` and incoming `tm.min` ranges from ~2ms to 
~770ms. All three samples share the `tid` prefix 
`6359c73a002c425785500f958cdc4007`, i.e. the same trace.
   
   #### Why this fires on normal traffic
   
   In SkyWalking, a single trace is composed of segments produced by multiple 
Java agents on different services. Wall clocks across services are not strictly 
monotonic relative to each other (NTP drift, container clocks, sub-millisecond 
inter-service hops). When OAP forwards segments belonging to one traceID to 
BanyanDB, segments can arrive in batches whose `min(timestamp)` is slightly 
earlier than a previously-flushed block for the same traceID.
   
   The `block_writer` treats this as a programming invariant violation and 
panics; in practice it is normal upstream input.
   
   #### Impact
   
   - Every panic is recovered by the gRPC stream interceptor, so the server 
keeps running, but the **single in-flight trace-write batch is dropped**.
   - On 0.9.0 the same panic also fires; in our environment it left the gRPC 
server in a degraded state and pods restarted ~406 times over 7 days. On 0.10.1 
the recovery is clean (pod stays up, 0 restarts), but trace data loss continues 
at ~1 batch/minute.
   - Net effect: under any non-trivial trace volume, BanyanDB silently sheds a 
small fraction of traces and produces a continuous stream of stack traces.
   
   ### What you expected to happen
   
   BanyanDB should accept slightly out-of-order timestamps within the same 
traceID without panicking and without dropping the batch.
   
   Suggested directions (the maintainers likely know better which is 
appropriate):
   
   1. **Sort incoming blocks by `tm.min` per traceID before writing**, instead 
of asserting input order.
   2. **Demote to a warning + drop only the offending block** rather than 
`Panicf`, so the rest of the batch is durable.
   3. If strict ordering is required for an internal index, **track per-traceID 
`minTimestampLast` and tolerate small backward skew** within a configurable 
window.
   
   ### How to reproduce
   
   Steady-state SkyWalking deployment with multiple Java agents reporting 
traces to OAP, OAP backed by BanyanDB. We see this on:
   
   - BanyanDB: `apache/skywalking-banyandb:0.10.1` (also reproduced on 0.9.0)
   - SkyWalking OAP: `apache/skywalking-oap-server:10.4.0`
   - ~30+ services, `apache-skywalking-java-agent` 9.5.0, JDK 21
   - Standalone BanyanDB on Kubernetes (Aliyun ACK), 
`--trace-root-path=/data/trace`
   - No special configuration; typical SkyWalking trace volume (GiBs/day)
   
   Within ~3 minutes of OAP starting, the first panic appears in BanyanDB logs; 
cadence stabilizes at roughly one panic per minute.
   
   ### Anything else
   
   Relevant code references in `apache/skywalking-banyandb`:
   
   - `banyand/trace/block_writer.go:261` — first Panicf
   - `banyand/trace/block_writer.go:322` — second Panicf
   - Same pattern in `banyand/stream/block_writer.go` and 
`banyand/measure/block_writer.go`
   
   Happy to provide more samples (full stack traces, debug logs, longer time 
series) or test a candidate fix on our cluster.
   
   ### Are you willing to submit a pull request to fix on your own
   
   - [ ] Yes, I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to