hanahmily opened a new pull request, #1161: URL: https://github.com/apache/skywalking-banyandb/pull/1161
## What Redesign the `queue_pub` / `queue_sub` Prometheus metrics into a single uniform model that answers the two questions operators actually ask — **which group/operation is slow or erroring**, and **what is the cluster call topology (liaison ↔ data hot/warm/cold)** — instead of the previous ~20 overlapping, `topic`-labeled instruments. ### Metric model - **Base metrics only:** `total_started`, `total_finished`, `total_latency` (now a **histogram**), `total_err`. Plus **file-sync-only** `sent_bytes` (pub) / `received_bytes` (sub). - **Labels:** `operation` (`batch-write` / `file-sync` / `query` / `control`), `group`, and remote-endpoint labels `remote_node` / `remote_role` / `remote_tier`; `total_err` adds `error_type`. The `topic` label is removed. - **Topology:** `remote_node` equals the BanyanDB node `metadata.name`, so a series joins 1:1 with `/cluster/topology` `nodes[]` and `calls[].source/target`. The local end is the scrape target. Each edge is reconstructable from both ends (pub on the source carries `remote_node=<target>`; sub on the target carries `remote_node=<sender>`). ### Wire changes (additive, backward-compatible) - `cluster.v1.SendRequest` gains `group` (per message) and `sender_node` / `sender_role` / `sender_tier` (stamped on the first frame of a stream). - `cluster.v1.SyncMetadata` gains `sender_*`. - Pub-side `remote_role` / `remote_tier` are resolved from the connection registry; sub-side from the wire `sender_*`. Self identity is wired from the metadata current node. ## Breaking change The previous `queue_*` metric/label names are removed: `*_total_msg_*`, `queue_pub_send_*`, the inflight/retry/backoff gauges, `chunked_sync_*` / `chunk_reorder_*`, and the `topic` label. Dashboards/alerts referencing them must be updated. (CHANGES.md updated.) ## Verification - `make pre-push` clean for this change: `generate` + `generate-test-cases` + `check` leave the tree consistent (no codegen drift), and `lint` passes. (`vuln-check` flags a pre-existing repo-wide advisory in the indirect dep `golang.org/x/[email protected]`, unrelated to this PR.) - Unit tests: `go test ./banyand/queue/... ./api/data/...` green. - **Live cluster (SkyWalking showcase):** deployed and confirmed end-to-end — new families present with the new labels, all old families gone, all four `operation` values flowing, `remote_node` matches `/cluster/topology` exactly, and the liaison↔data(hot) edge reconstructs from both the pub (`remote_role=data`, `remote_tier=hot`) and sub (`remote_role=liaison`) sides. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
