hanahmily opened a new pull request, #1161:
URL: https://github.com/apache/skywalking-banyandb/pull/1161

   ## What
   
   Redesign the `queue_pub` / `queue_sub` Prometheus metrics into a single 
uniform model that answers the two questions operators actually ask — **which 
group/operation is slow or erroring**, and **what is the cluster call topology 
(liaison ↔ data hot/warm/cold)** — instead of the previous ~20 overlapping, 
`topic`-labeled instruments.
   
   ### Metric model
   - **Base metrics only:** `total_started`, `total_finished`, `total_latency` 
(now a **histogram**), `total_err`. Plus **file-sync-only** `sent_bytes` (pub) 
/ `received_bytes` (sub).
   - **Labels:** `operation` (`batch-write` / `file-sync` / `query` / 
`control`), `group`, and remote-endpoint labels `remote_node` / `remote_role` / 
`remote_tier`; `total_err` adds `error_type`. The `topic` label is removed.
   - **Topology:** `remote_node` equals the BanyanDB node `metadata.name`, so a 
series joins 1:1 with `/cluster/topology` `nodes[]` and 
`calls[].source/target`. The local end is the scrape target. Each edge is 
reconstructable from both ends (pub on the source carries 
`remote_node=<target>`; sub on the target carries `remote_node=<sender>`).
   
   ### Wire changes (additive, backward-compatible)
   - `cluster.v1.SendRequest` gains `group` (per message) and `sender_node` / 
`sender_role` / `sender_tier` (stamped on the first frame of a stream).
   - `cluster.v1.SyncMetadata` gains `sender_*`.
   - Pub-side `remote_role` / `remote_tier` are resolved from the connection 
registry; sub-side from the wire `sender_*`. Self identity is wired from the 
metadata current node.
   
   ## Breaking change
   The previous `queue_*` metric/label names are removed: `*_total_msg_*`, 
`queue_pub_send_*`, the inflight/retry/backoff gauges, `chunked_sync_*` / 
`chunk_reorder_*`, and the `topic` label. Dashboards/alerts referencing them 
must be updated. (CHANGES.md updated.)
   
   ## Verification
   - `make pre-push` clean for this change: `generate` + `generate-test-cases` 
+ `check` leave the tree consistent (no codegen drift), and `lint` passes. 
(`vuln-check` flags a pre-existing repo-wide advisory in the indirect dep 
`golang.org/x/[email protected]`, unrelated to this PR.)
   - Unit tests: `go test ./banyand/queue/... ./api/data/...` green.
   - **Live cluster (SkyWalking showcase):** deployed and confirmed end-to-end 
— new families present with the new labels, all old families gone, all four 
`operation` values flowing, `remote_node` matches `/cluster/topology` exactly, 
and the liaison↔data(hot) edge reconstructs from both the pub 
(`remote_role=data`, `remote_tier=hot`) and sub (`remote_role=liaison`) sides.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to