Re: [PR] Replace DataCarrier with BatchQueue for metrics pipeline [skywalking]

via GitHub Sat, 21 Feb 2026 19:37:43 -0800


wu-sheng commented on PR #13703:
URL: https://github.com/apache/skywalking/pull/13703#issuecomment-3940068640


   ## Benchmark: BatchQueue Thread Pool Analysis
   
   Ran the thread dump benchmark (`benchmarks/run.sh run cluster_oap-banyandb 
thread-analysis`) against commit 0518b557db on a 2-node OAP cluster with Istio 
ALS + Bookinfo (~5 RPS).
   
   ### Environment
   - **OAP**: 2 replicas, BanyanDB standalone, Istio ALS (`k8s-mesh`)
   - **JRE**: OpenJDK 11.0.30 Temurin (aarch64, Ubuntu 24.04)
   - **K8s**: Kind 0.31.0, K8s 1.34
   - **Traffic**: Bookinfo sample app via Envoy sidecars, ~5 RPS
   
   ### BatchQueue Thread Pools
   
   **Both OAP pods show identical, stable BatchQueue thread counts across all 5 
dump rounds (60s apart):**
   
   | BatchQueue Pool | Threads | States | Notes |
   |---|---|---|---|
   | `BatchQueue-METRICS_L1_AGGREGATION` | **10** | WAITING(9) TIMED_WAITING(1) 
| 9 consumer threads idle, 1 producer ticking |
   | `BatchQueue-METRICS_L2_PERSISTENCE` | **4** | WAITING(2-3) 
TIMED_WAITING(1-2) | Stable at 4 across all rounds |
   | `BatchQueue-TOPN_PERSISTENCE` | **1** | TIMED_WAITING(1) | Single thread, 
idle |
   | `BatchQueue-GRPC_REMOTE_*` | **1** | TIMED_WAITING(1) | One per remote OAP 
node |
   
   **Thread count trend (identical on both pods):**
   ```
   Pool Name                              #1    #2    #3    #4    #5
   BatchQueue-GRPC_REMOTE_*                1     1     1     1     1
   BatchQueue-METRICS_L1_AGGREGATION      10    10    10    10    10
   BatchQueue-METRICS_L2_PERSISTENCE       4     4     4     4     4
   BatchQueue-TOPN_PERSISTENCE             1     1     1     1     1
   ```
   
   ### Key Observations
   
   1. **Thread counts are perfectly stable** — no growth or leaks across 5 
rounds (~5 min total)
   2. **Both pods are symmetric** — identical BatchQueue thread counts
   3. **Consumer threads are properly idle** — mostly in WAITING/TIMED_WAITING, 
no busy-spinning
   4. **Total OAP thread count**: Pod 1 settled at ~275-302 threads; Pod 2 at 
~84-101 threads (difference is due to `armeria-common-blocking-tasks` pool only 
active on Pod 1 which handles more gRPC traffic)
   5. **16 BatchQueue threads per pod** (10 + 4 + 1 + 1) for the metrics 
pipeline under ~5 RPS load


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Replace DataCarrier with BatchQueue for metrics pipeline [skywalking]

Reply via email to