hanahmily opened a new issue, #13874: URL: https://github.com/apache/skywalking/issues/13874
## Summary During a 48-hour soak of standalone BanyanDB, goroutine count grew from 556 to 708 (+27%). Root cause is bluge index writers spawning their analysis-worker pools on each segment rotation without releasing them when the writer goes idle. ## Reproduction 1. Run standalone BanyanDB with a measure group using `SegmentInterval: 1 day`. 2. Drive sustained write traffic for ≥48 h (any continuous write workload that crosses a UTC midnight reproduces; in our setup we used SkyWalking OAP traffic plus a synthetic ~1000-row/day fixture). 3. Sample `/debug/pprof/goroutine?debug=1` every 30 min. ## Observed pattern | Time (UTC) | Goroutines | Δ | Note | |---|---|---|---| | t0 | 556 | — | baseline | | t0 + ~21 h (first UTC midnight crossed) | 556 → 632 | +76 | first segment rotation event | | t0 + ~45 h (second UTC midnight crossed) | 632 → 708 | +76 | second event, identical shape | The two events are spaced exactly 24 h apart and add ~76 goroutines each. Between events, goroutine count is **flat to within ±1** — no steady leak, only a daily step. ## Stack analysis Diffing the goroutine profile between start and end: - **+108 in `bluge/index.analysisWorker`** at `github.com/blugelabs/bluge/index.OpenWriter.func1` (writer.go:77 → analysisWorker at writer.go:667). - The remaining ~44 are orchestration goroutines around the new writers (waiters, transmit loops). - Every other stack signature (e.g. `pkg/flow.Transmit`, `grpc/internal/grpcsync.CallbackSerializer.run`) is **identical count** start vs end. The 108 = 2 events × ~54 analysisWorkers per writer. bluge sizes this pool from GOMAXPROCS (host had 32 CPUs, default pool size ~54), so per-event growth scales with the worker host's GOMAXPROCS. ## Hypothesis When a tsTable rotates to a new daily segment, BanyanDB opens a new bluge index writer for that segment but does not close writers for older segments that are no longer being written to. Each leftover writer keeps its `analysisWorker` goroutine pool alive. Over weeks/months the growth would be linear in segment count, eventually pressuring the Go scheduler and overall memory footprint. ## Suggested fix direction Close bluge index writers for segments outside the current write window. This may already be the intent of segment-lifecycle hooks in `banyand/internal/storage/`; the leak suggests a missing close path on the bluge writer specifically. ## Environment - Standalone mode, single-node deployment. - `SegmentInterval: 1 day`, default flush timeouts. - Host: 32-core Linux, no swap. - Sustained write rate ~1 req/s through gRPC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
