hanahmily opened a new issue, #13874:
URL: https://github.com/apache/skywalking/issues/13874

   ## Summary
   
   During a 48-hour soak of standalone BanyanDB, goroutine count grew from 556 
to 708 (+27%). Root cause is bluge index writers spawning their analysis-worker 
pools on each segment rotation without releasing them when the writer goes idle.
   
   ## Reproduction
   
   1. Run standalone BanyanDB with a measure group using `SegmentInterval: 1 
day`.
   2. Drive sustained write traffic for ≥48 h (any continuous write workload 
that crosses a UTC midnight reproduces; in our setup we used SkyWalking OAP 
traffic plus a synthetic ~1000-row/day fixture).
   3. Sample `/debug/pprof/goroutine?debug=1` every 30 min.
   
   ## Observed pattern
   
   | Time (UTC) | Goroutines | Δ | Note |
   |---|---|---|---|
   | t0 | 556 | — | baseline |
   | t0 + ~21 h (first UTC midnight crossed) | 556 → 632 | +76 | first segment 
rotation event |
   | t0 + ~45 h (second UTC midnight crossed) | 632 → 708 | +76 | second event, 
identical shape |
   
   The two events are spaced exactly 24 h apart and add ~76 goroutines each. 
Between events, goroutine count is **flat to within ±1** — no steady leak, only 
a daily step.
   
   ## Stack analysis
   
   Diffing the goroutine profile between start and end:
   
   - **+108 in `bluge/index.analysisWorker`** at 
`github.com/blugelabs/bluge/index.OpenWriter.func1` (writer.go:77 → 
analysisWorker at writer.go:667).
   - The remaining ~44 are orchestration goroutines around the new writers 
(waiters, transmit loops).
   - Every other stack signature (e.g. `pkg/flow.Transmit`, 
`grpc/internal/grpcsync.CallbackSerializer.run`) is **identical count** start 
vs end.
   
   The 108 = 2 events × ~54 analysisWorkers per writer. bluge sizes this pool 
from GOMAXPROCS (host had 32 CPUs, default pool size ~54), so per-event growth 
scales with the worker host's GOMAXPROCS.
   
   ## Hypothesis
   
   When a tsTable rotates to a new daily segment, BanyanDB opens a new bluge 
index writer for that segment but does not close writers for older segments 
that are no longer being written to. Each leftover writer keeps its 
`analysisWorker` goroutine pool alive.
   
   Over weeks/months the growth would be linear in segment count, eventually 
pressuring the Go scheduler and overall memory footprint.
   
   ## Suggested fix direction
   
   Close bluge index writers for segments outside the current write window. 
This may already be the intent of segment-lifecycle hooks in 
`banyand/internal/storage/`; the leak suggests a missing close path on the 
bluge writer specifically.
   
   ## Environment
   
   - Standalone mode, single-node deployment.
   - `SegmentInterval: 1 day`, default flush timeouts.
   - Host: 32-core Linux, no swap.
   - Sustained write rate ~1 req/s through gRPC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to