This is an automated email from the ASF dual-hosted git repository. wu-sheng pushed a commit to branch swip-15-container-level-instance in repository https://gitbox.apache.org/repos/asf/skywalking.git
commit 70b36ff88c8e2979394a632f96718d270b426846 Author: Wu Sheng <[email protected]> AuthorDate: Wed Jun 10 16:23:17 2026 +0800 SWIP-15: container-level instances; sync with horizon-ui structured visibleWhen and MAL semantics --- docs/en/swip/SWIP-15.md | 464 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 322 insertions(+), 142 deletions(-) diff --git a/docs/en/swip/SWIP-15.md b/docs/en/swip/SWIP-15.md index c72248558e..313d829758 100644 --- a/docs/en/swip/SWIP-15.md +++ b/docs/en/swip/SWIP-15.md @@ -28,31 +28,35 @@ Three things changed underneath it: 3. **SkyWalking replaced the bundled booster UI with the Horizon UI.** The OAP backend no longer ships dashboard JSON (dropped in #13877); BanyanDB has not yet been ported to Horizon UI at all. Horizon UI is config-driven, has a real **Service → Instance → Endpoint** hierarchy, surfaces per-instance - attributes, and can hide panels that have no data — and, with a small enhancement, can drive panel - visibility from instance **attributes** (role / tier). + attributes, and gates widget visibility through structured, server-evaluated `visibleWhen` + predicates — data presence and instance-**attribute** equality ship today (horizon-ui #46); a small + extension (membership / negation operators) completes role/tier-driven dashboards. The current feature does none of this. It models **each node as its own `Service`** (`service(['host_name'], Layer.BANYANDB)`), so a cluster appears as a pile of unrelated services; it -never models the cluster, the node role, the tier, or the group; and it still references metrics that -BanyanDB removed (an `etcd`-era operation rate, a Prometheus `up`-derived "active instances", and the -pre-refactor `queue_sub_total_msg_sent_err` family). +never models the cluster, the node role, the tier, or the group; and it still ships stale or misleading +metrics (an operation rate still named after the retired `etcd` registry, a Prometheus `up`-derived +"active instances" that under the FODC proxy would describe the proxy rather than any node, and the +`queue_sub_total_msg_sent_err` family, which BanyanDB removed). **This SWIP proposes to discard that model and rebuild BanyanDB self-observability around the cluster / node / group reality**, matching the upstream FODC-proxy metric catalog, and to design the Horizon UI -side — a net-new BanyanDB layer dashboard whose node view **adapts to the selected node's role and -tier** — including the one Horizon UI enhancement that makes attribute-driven dashboards possible. +side — a net-new BanyanDB layer dashboard whose instance view **adapts to the selected container's +role and tier** — including the small Horizon UI entity-gate extension that completes attribute-driven +dashboards. ### Goals - Model a BanyanDB **cluster** as a single SkyWalking `Service`. -- Model each **node** as a `ServiceInstance`, carrying its **role** and **tier** as instance - attributes, so the UI can show "what this node is". +- Model each **container** (`pod_name` + `container_name`) as a `ServiceInstance`, carrying its + **role** and **tier** as instance attributes, so the UI can show "what this container is". - Model each **group** as an `Endpoint` of the cluster. - Mirror the upstream FODC-proxy metric catalog faithfully (the two-dashboard split becomes the Instance and Endpoint views). -- Make the **node dashboard dynamic** — a liaison node shows ingestion/queue/publish panels, a data - node shows storage/index/subscribe panels, and the tier refines the data view — first via the - data-presence mechanism that already exists, then via a proposed attribute predicate. +- Make the **instance dashboard dynamic** — a liaison container shows ingestion/queue/publish panels, a + data container shows storage/index/subscribe panels, a lifecycle container shows migration panels, and + the tier refines the data view — via the structured `visibleWhen` gates Horizon UI already evaluates + (data presence + attribute equality), completed by a proposed membership/negation extension. ### Non-goals @@ -72,10 +76,10 @@ tier** — including the one Horizon UI enhancement that makes attribute-driven ──────────────── ───────────── ────────── ┌─ liaison node ─┐ FODC agent ┐ BANYANDB layer │ :2121 /metrics │ (sidecar) │ ┌───────────────────────┐ - └────────────────┘ │ │ Root: cluster list │ - ┌─ data hot ─────┐ FODC agent ├─► FODC proxy ──► OTel Collector ──► │ Service: cluster KPIs │ - │ :2121 /metrics │ (sidecar) │ :17913 (prometheus recv, │ Instance: node, panels│ - └────────────────┘ │ /metrics adds `cluster` │ adapt to role/tier │ + └────────────────┘ │ │ Root: cluster list │ + ┌─ data hot ─────┐ FODC agent ├─► FODC proxy ──► OTel Collector ──► │ Service: cluster KPIs │ + │ :2121 /metrics │ (sidecar) │ :17913 (prometheus recv, │ Instance: container, │ + └────────────────┘ │ /metrics adds `cluster` │ adapts to role/tier │ ┌─ data warm ────┐ FODC agent │ single target, label) ──OTLP──► │ Endpoint: group │ │ :2121 /metrics │ (sidecar) │ per-node labels │ └───────────────────────┘ └────────────────┘ ┘ node_role/pod_name/ │ ▲ @@ -83,7 +87,7 @@ tier** — including the one Horizon UI enhancement that makes attribute-driven │ :2121 /metrics │ node_type receiver-otel ──► MAL │ GraphQL └────────────────┘ otel-rules/banyandb/* ───────────┘ execExpression ├ banyandb-service.yaml → Service (cluster) - ├ banyandb-instance.yaml → Instance (node + attrs) + ├ banyandb-instance.yaml → Instance (container + attrs) └ banyandb-endpoint.yaml → Endpoint (group) │ ▼ @@ -100,41 +104,62 @@ identity. ### 1. Entity model -| SkyWalking entity | BanyanDB concept | Identity source (label) | -| ---------------------------- | --------------------------------------------- | --------------------------------------- | -| `Service` (Layer `BANYANDB`) | one BanyanDB **cluster** | `cluster` (injected by the collector) | -| `ServiceInstance` | one **node** | `pod_name` (e.g. `banyandb-data-hot-0`) | -| ↳ attribute `node_role` | node **role** | `container_name` (`liaison` / `data`) | -| ↳ attribute `node_type` | data-node **tier** | `node_type` (`hot` / `warm` / `cold`) | -| `Endpoint` | one **group** (storage partition) | `group` (`measure-default`, …) | - -A standalone BanyanDB is the degenerate case: one cluster, one node whose role is `standalone` (all -roles co-resident) and no tier. - -**Why role/tier are instance attributes, not separate services or endpoints.** A node's identity is -its `pod_name`; its role and tier are *properties of that node*, which is exactly what -`InstanceTraffic.properties` (the UI "Attributes" panel) is for. Keeping the cluster as the single -service means the node list, the group list, and cluster-wide KPIs all live under one entity the -operator can reason about — and it makes the node dashboard able to adapt to the selected node's -attributes. +| SkyWalking entity | BanyanDB concept | Identity source (label) | +| ---------------------------- | ----------------------------------------- | ------------------------------------------------------ | +| `Service` (Layer `BANYANDB`) | one BanyanDB **cluster** | `cluster` (injected by the collector) | +| `ServiceInstance` | one **container** on a node | `pod_name` + `container_name` (composite) | +| ↳ attribute `container_name` | container **role** (discriminator) | `liaison` / `data` / `lifecycle` | +| ↳ attribute `node_type` | data-node **tier** | `hot` / `warm` / `cold` (absent off data) | +| ↳ attribute `node_role` | role enum (coarse) | `ROLE_LIAISON` / `ROLE_DATA` | +| ↳ attribute `pod_name` | host pod (sibling key) | `demo-banyandb-data-hot-0` | +| `Endpoint` | one **group** (storage partition) | `group` (`sw_metricsMinute`, …) | + +All four labels are attached as instance attributes **verbatim** (not renamed), because the Horizon UI +deployment/topology component groups the intra-cluster instance graph by them: `clusterBy` = +`node_role` + `node_type`, `siblingBy` = `pod_name`, `roleBy` = `container_name`. Emitting the raw +label names keeps the OAP attribute bag and the UI grouping config in lockstep. + +**Why the instance is a container, not a `pod_name`.** `pod_name` is **not unique per metrics +emitter**: a data hot/warm pod co-hosts a `lifecycle` migration sidecar that reports under the *same* +`pod_name` (verified on the live cluster — `demo-banyandb-data-hot-0` emits both `container_name=data` +and `container_name=lifecycle`). Keying the instance on `pod_name` alone would silently merge the two +series. The instance identity is therefore `pod_name` + `container_name`, and `container_name` — not +`node_role` — is the role discriminator: `node_role` carries only `ROLE_LIAISON` / `ROLE_DATA` on a +healthy cluster (it stays `ROLE_DATA` on the lifecycle sidecar, and the FODC agent maps unresolved or +meta-only nodes to a transient `ROLE_UNSPECIFIED`), whereas `container_name` cleanly separates +`liaison` / `data` / `lifecycle`. A standalone BanyanDB is the degenerate case: one cluster, one node, +one `container_name=standalone`, no tier. + +**Why container/tier are instance attributes, not separate services or endpoints.** A container's role +and tier are *properties of that instance*, which is exactly what `InstanceTraffic.properties` (the UI +"Attributes" panel) is for. Keeping the cluster as the single service means the instance list, the +group list, and cluster-wide KPIs all live under one entity the operator can reason about — and it +makes the instance dashboard able to adapt to the selected container's attributes. ### 2. Scrape source and label scheme (FODC proxy only) SkyWalking scrapes the **FODC proxy `/metrics`** (default `:17913`) as the single Prometheus target. -The proxy aggregates every node's metrics and stamps four identity labels onto each sample (verified in -the FODC agent's `ParseWithNodeLabels`): +The proxy aggregates every container's metrics and stamps four identity labels onto each sample +(verified in the FODC agent's `ParseWithNodeLabels` and against the live cluster): + +| Label | Value | Used for | +| ---------------- | ---------------------------------------------- | ----------------------------------------------------- | +| `pod_name` | node identity, e.g. `banyandb-data-hot-0` | instance name (part 1) — **not unique**, see below | +| `container_name` | `liaison` / `data` / `lifecycle` | instance name (part 2) + attribute `container_name` (the role discriminator) | +| `node_role` | raw enum `ROLE_LIAISON` / `ROLE_DATA` (transiently `ROLE_UNSPECIFIED`) | **not** the discriminator — coarser than `container_name`, stays `ROLE_DATA` on the lifecycle sidecar | +| `node_type` | `hot` / `warm` / `cold` (data containers only) | instance attribute `node_type` (tier) | -| Label | Value | Used for | -| ---------------- | -------------------------------------------- | --------------------------------- | -| `pod_name` | full node identity, e.g. `banyandb-data-hot-0` | instance name | -| `container_name` | `liaison` / `data` (the role discriminator) | instance attribute `node_role` | -| `node_role` | raw enum `ROLE_LIAISON` / `ROLE_DATA` | (available; `container_name` preferred for clean values) | -| `node_type` | `hot` / `warm` / `cold` (data nodes only) | instance attribute `node_type` (tier) | +`pod_name` alone does **not** identify an instance: on the live cluster the four data hot/warm pods +each run two containers (`data` + `lifecycle`) under one `pod_name`, so the instance key is +`pod_name` + `container_name`. All original BanyanDB labels are preserved on every sample: `group`, `service`, `method`, `operation`, -`remote_node`, `remote_role`, `remote_tier`, `error_type`, `kind`, `path`, `type`, `name`, `le`, …. The -Prometheus-synthesized `instance` / `job` / `up` describe the **proxy**, not individual nodes — node -liveness is derived from the always-present per-node gauge `banyandb_system_up_time`, never from `up`. +`remote_node`, `remote_role`, `remote_tier`, `error_type`, `kind`, `path`, `type`, `seg`, `shard`, +`le`, …. Note `service` is BanyanDB's internal **data-model module** (`measure` / `stream` / `trace` / +`property` / `group`) — a workload facet, **never** a SkyWalking service identity. The +Prometheus-synthesized `instance` / `job` / `up` describe the **proxy**, not individual containers — +node liveness is derived from the always-present per-container gauge `banyandb_system_up_time`, never +from `up`. **Collector scrape job (illustrative — operator configuration, not a shipped file):** @@ -168,26 +193,64 @@ filter: "{ tags -> tags.job_name == 'banyandb-monitoring' }" # banyandb-service.yaml → cluster expSuffix: service(['cluster'], Layer.BANYANDB) -# banyandb-instance.yaml → node, with role + tier as attributes +# banyandb-instance.yaml → container (a node may run >1 container), role + tier as attributes expSuffix: |- service(['cluster'], Layer.BANYANDB) - .instance(['cluster'], '::', ['pod_name'], '', Layer.BANYANDB, - { tags -> ['node_role': tags.container_name, 'node_type': tags.node_type ?: 'n/a'] }) + .instance(['cluster'], '::', ['pod_name', 'container_name'], '@', Layer.BANYANDB, + { tags -> ['node_role': tags.node_role, + 'node_type': tags.node_type ?: 'n/a', + 'pod_name': tags.pod_name, + 'container_name': tags.container_name] }) # banyandb-endpoint.yaml → group expSuffix: endpoint(['cluster'], ['group'], Layer.BANYANDB) ``` -The 6-argument `.instance(...)` overload's properties closure is the standard, precedented mechanism for +The instance key is the pair `['pod_name', 'container_name']` joined by `'@'` (signature +`instance(serviceKeys, serviceDelimiter, instanceKeys, instanceDelimiter, layer, propertiesExtractor)`), +so the four `data` hot/warm pods surface as distinct `…@data` and `…@lifecycle` instances rather than +colliding. The 6-argument overload's properties closure is the standard, precedented mechanism for attaching labels as instance attributes (the same shape used by `k8s-instance.yaml`). The attributes -ride entirely on the scraped labels — no separate update API. +ride entirely on the scraped labels — no separate update API. (Two implementation notes: the MAL v2 +grammar supports the Elvis operator inside a map-literal value, but no shipped rule combines the two +yet — the implementation PR should pin this exact closure shape with a compile test. And `language` is +the one reserved property key — the instance query maps it to the language field instead of an +attribute; none of these four labels collides with it.) ### 3. Metric catalog → MAL rules The redesigned rules mirror the upstream FODC-proxy catalog. The two upstream Grafana boards map onto -two SkyWalking scopes — **Nodes → Instance** (per `pod_name`), **Workload → Endpoint** (per `group`) — -plus a small **Service** summary for cluster KPIs. Source metric names below are verified against -BanyanDB `origin/main` (the base of the upstream observability PR). +two SkyWalking scopes — **Nodes → Instance** (per `pod_name` + `container_name`), **Workload → +Endpoint** (per `group`) — plus a small **Service** summary for cluster KPIs. Source metric names +below are verified against the **live demo cluster** — which runs upstream `main` builds (the +validation pull used the showcase-pinned `main` image of 2026-06-09) — and against BanyanDB +`origin/main` source. The upstream observability PR +[#1159](https://github.com/apache/skywalking-banyandb/pull/1159) (open; docs and Grafana dashboards +only, no metric code) documents the same catalog and defines the two boards this design mirrors. + +> **Metric-name prefix (build-critical).** The sketches below drop a common prefix for readability. +> On the wire **every BanyanDB-native family carries the `banyandb_` prefix** (`banyandb_measure_total_written`, +> `banyandb_liaison_grpc_total_started`, `banyandb_system_disk`, …) — the MAL rules must use the full +> prefixed name. The **only** exceptions are the standard Go-runtime and process exporter families +> `go_*` / `process_*`, which are **bare** (no prefix) and are referenced as-is. Every error counter +> this catalog references is lazily registered and emits nothing until the first error fires +> (`banyandb_liaison_grpc_total_err`, `banyandb_liaison_grpc_total_stream_msg_received_err`, +> `banyandb_queue_pub_total_err`, the `*_total_sync_loop_err` family), and the lifecycle last-run +> gauges (`banyandb_lifecycle_last_run_*`, BanyanDB #1167) post-date the build the demo pull validated; +> every other cited family was present in that pull. +> +> **Sketch notation (PromQL-flavored).** Source expressions are written PromQL-style for readability; +> the MAL forms differ mechanically. **(1)** No `or vector(0)` guard exists in MAL — nor is one +> needed: an unfired family resolves to the empty sample family, MAL's `+` treats an empty operand as +> identity, and a rule is skipped only when *all* referenced families are absent — so an error sum +> emits as soon as any one term fires, and a fully healthy cluster shows no series at all (dashboards +> should render absent as 0). **(2)** MAL arithmetic joins samples on exact label equality, so each +> term must be aggregated to the same label set (e.g. `.sum(['cluster'])`) before `+`. **(3)** +> `count(...) by (...)` maps to MAL's multi-label `count([...])`; `histogram_quantile(0.99, …_bucket)` +> maps to `.histogram().histogram_percentile([99])` on the `le`-labeled base family (no `_bucket` +> suffix remains after OTLP conversion); and `time() - <metric>` is computed at **ingest** in the MAL +> rule — MAL ships `time()` (the shipped `envoy-ca.yaml` cert-staleness metric is the precedent), +> while MQE has no current-time function, so it cannot be computed at query time. #### 3.1 Service scope — cluster summary (`banyandb-service.yaml`) @@ -195,15 +258,15 @@ BanyanDB `origin/main` (the base of the upstream observability PR). | --------------------------- | ------------------------ | ------------------------------------------------------------------------------------------ | | `cluster_write_rate` | cluster writes/s | `rate(measure_total_written) + rate(stream_tst_total_written) + rate(trace_tst_total_written)` | | `cluster_query_rate` | cluster queries/s | `rate(liaison_grpc_total_started{method='query'})` | -| `cluster_error_rate` | cluster errors/min | `liaison_grpc_total_err + _stream_msg_received_err + schema_server_grpc_total_err + queue_pub_total_err + Σ *_total_sync_loop_err` (×60, each `or vector(0)`) | -| `reporting_nodes` | live node count by role | `count(system_up_time) by (container_name)` | +| `cluster_error_rate` | cluster errors/min | `liaison_grpc_total_err + liaison_grpc_total_stream_msg_received_err + schema_server_grpc_total_err + queue_pub_total_err + Σ *_total_sync_loop_err` (×60; all lazily registered — see sketch notation above) | +| `reporting_nodes` | live container count by role | `count(system_up_time) by (container_name)` | | `total_cpu_cores` | cluster CPU capacity | `sum(system_cpu_num)` | | `total_memory_used` | cluster memory used | `sum(system_memory_state{kind='used'})` | | `total_disk_used` | cluster disk used | `sum(system_disk{kind='used'})` | -#### 3.2 Instance scope — per node (`banyandb-instance.yaml`) +#### 3.2 Instance scope — per container (`banyandb-instance.yaml`) -**All roles** (every node emits these — the "Nodes" board): +**All roles** (every container emits these — the "Nodes" board): | Metric (`meter_banyandb_instance_*`) | Source | | ------------------------------------ | ---------------------------------------------------------------- | @@ -218,26 +281,44 @@ BanyanDB `origin/main` (the base of the upstream observability PR). | `gc_pause_avg` | `rate(go_gc_duration_seconds_sum) / rate(go_gc_duration_seconds_count)` | | `heap_inuse` / `heap_next_gc` / `alloc_rate` | `go_memstats_heap_inuse_bytes` / `go_memstats_next_gc_bytes` / `rate(go_memstats_alloc_bytes_total)` | -**Liaison-only** (front door; hidden on data nodes — see [§4](#4-dynamic-metrics-by-role-and-tier)): +**Liaison-only** (front door; hidden on data containers — see [dynamic metrics by role and tier](#4-dynamic-metrics-by-role-and-tier)): | Metric (`meter_banyandb_instance_*`) | Source | | ------------------------------------- | ----------------------------------------------------------------------- | | `query_rate_by_service` | `rate(liaison_grpc_total_started{method='query'}) by (service)` | -| `grpc_error_rate` | `rate(liaison_grpc_total_err) by (service, method)` (+ `_stream_msg_received_err`; both lazily registered) | +| `grpc_error_rate` | `rate(liaison_grpc_total_err) by (service, method)` (+ `liaison_grpc_total_stream_msg_received_err`; both lazily registered) | | `non_query_op_rate` | `rate(liaison_grpc_total_started{method!='query'}) by (method)` | | `write_rate` | `rate({measure,stream_tst,trace_tst}_total_written)` | | `publish_throughput` / `publish_latency_p99` | `rate(queue_pub_total_finished) by (operation)` / `histogram_quantile(0.99, …queue_pub_total_latency_bucket)` | | `wqueue_file_parts` / `wqueue_mem_part` / `wqueue_pending` | `{measure,stream_tst,trace_tst}_total_file_parts` / `_total_mem_part` / `_pending_data_count` | -**Data-only** (backend; hidden on liaison nodes): +**Data-only** (backend; hidden on liaison containers): | Metric (`meter_banyandb_instance_*`) | Source | | ----------------------------------------------- | ------------------------------------------------------------------ | | `total_data` | `{measure,stream_tst,trace_tst}_total_file_elements` | | `merge_file_rate` / `merge_file_latency` / `merge_file_partitions` | `rate(*_total_merge_loop_started)` / `…_merge_latency{type='file'}` / `…_merged_parts{type='file'}` | -| `series_write_rate` / `series_term_search_rate` / `total_series` | `measure_inverted_index_total_updates` / `_term_searchers_started` / `_doc_count`; `stream_storage_inverted_index_*` | +| `series_write_rate` / `series_term_search_rate` / `total_series` | `measure_inverted_index_total_updates` / `_total_term_searchers_started` / `_total_doc_count`; `stream_storage_inverted_index_*` | | `stream_tst_write_rate` / `stream_tst_term_search_rate` / `stream_tst_total_docs` | `stream_tst_inverted_index_*` | | `queue_sub_throughput` / `queue_sub_latency_p99` (per `operation`) | `rate(queue_sub_total_started/finished) by (operation)` / `histogram_quantile(0.99, …queue_sub_total_latency_bucket) by (operation)` | +| `retention_disk_usage_percent` / `retention_cooldown` | `storage_retention_{measure,stream,trace}_disk_usage_percent` / `_forced_retention_cooldown_seconds` | + +**Lifecycle-only** (the tier-migration sidecar co-located on `hot`/`warm` data pods; `container_name == 'lifecycle'`): + +| Metric (`meter_banyandb_instance_*`) | Source | +| ------------------------------------ | ------------------------------------------------------------------ | +| `lifecycle_cycles` | `banyandb_lifecycle_cycles_total` (cumulative migration cycles) | +| `lifecycle_last_run` | `banyandb_lifecycle_last_run_timestamp_seconds` — epoch of the last cycle's start; "time since last sync" = `time() - <metric>`, computed at ingest in the MAL rule (MQE has no `time()`) | +| `lifecycle_last_run_success` | `banyandb_lifecycle_last_run_success` (`1` = last cycle OK, `0` = failed) | + +> **Lifecycle last-run signals.** The two gauges above were added in BanyanDB +> [#1167](https://github.com/apache/skywalking-banyandb/pull/1167) (merged to `main` on 2026-06-09, +> post-dating the build the demo pull validated) — both are +> stamped on every cycle end (success, error, or panic-recovered), so they drive a "time since last +> sync" staleness panel and a "last sync OK?" status panel directly. They emit only **after the first +> migration runs**, so the staleness panel must guard the never-run case. The same PR also stamps the +> lifecycle's sender identity onto its migration publisher, so a destination data node's `queue_sub` +> `remote_node` / `remote_role` / `remote_tier` now identify the migration source (were empty before). #### 3.3 Endpoint scope — per group (`banyandb-endpoint.yaml`) @@ -250,7 +331,7 @@ nodes per group): | `query_latency` | `rate(liaison_grpc_total_latency{method='query'}) / rate(…_started{method='query'}) by (group)` | | `total_data` | `{measure,stream_tst,trace_tst}_total_file_elements by (group)` | | `merge_file_rate` / `merge_file_latency` / `merge_file_partitions` | the merge family `by (group)` | -| `series_write_rate` / `total_series` | inverted-index `_total_updates` / `_doc_count` `by (group)` | +| `series_write_rate` / `total_series` | inverted-index `_total_updates` / `_total_doc_count` `by (group)` | | `queue_throughput` / `queue_latency_p99` | `queue_sub` / `queue_pub` `by (operation, group)` | | `publish_bytes` | `rate(queue_pub_sent_bytes) by (group)` | @@ -262,98 +343,167 @@ nodes per group): ### 4. Dynamic metrics by role and tier -Different roles expose different metrics, so the **node (Instance) dashboard must adapt to the selected -node**. Two mechanisms, layered: +Different roles expose different metrics, so the **instance dashboard must adapt to the selected +container**. Horizon UI's widget `visibleWhen` is a structured, **server-evaluated** gate (the BFF +resolves it against data presence or the selected instance's attributes and returns gated-out widgets +as hidden; legacy free-text predicate strings are no longer parsed and degrade to ungated). Two gate +kinds, layered: -**(a) Data-presence gating — available today, no UI code.** Horizon UI already supports -`visibleWhen: "<metric> has value"` on a widget; a panel whose metric returns all-null self-hides. Each -MAL rule only produces samples for nodes that emit its source metric, so liaison-only metrics are simply -absent on data instances and vice-versa. This gives correct adaptive behavior out of the box: +**(a) Data-presence gating — available today, no UI code.** The `mqe`-kind gate hides a widget whose +expression returns no data. Each MAL rule only produces samples for containers that emit its source +metric, so liaison-only metrics are simply absent on data instances and vice-versa. This gives correct +adaptive behavior out of the box: ```jsonc { "id": "wqueue", "title": "Write Queue (wqueue)", "type": "line", "expressions": ["meter_banyandb_instance_wqueue_pending"], - "visibleWhen": "meter_banyandb_instance_wqueue_pending has value" } + "visibleWhen": { "kind": "mqe", "expression": "meter_banyandb_instance_wqueue_pending", "op": "exists" } } ``` -**(b) Attribute predicate — proposed enhancement (see [§6](#6-horizon-ui-enhancement-entity-attribute-predicate)).** +**(b) Attribute gating — equality ships today; membership is the proposed extension (see +[entity-gate membership operators](#6-horizon-ui-enhancement-entity-gate-membership-operators)).** Data-presence can't distinguish "wrong role" from "idle but right role", and it still issues the query. -An attribute predicate keys panel visibility directly on the node's `node_role` / `node_type` -attributes: +The `entity`-kind gate keys panel visibility directly on the selected instance's `container_name` / +`node_type` attributes (meaningful on the Instance scope only): ```jsonc -{ "id": "wqueue", "visibleWhen": "#entity.node_role == 'liaison'" } -{ "id": "cold_tier_note", "visibleWhen": "#entity.node_type == 'cold'" } +{ "id": "wqueue", "visibleWhen": { "kind": "entity", "attribute": "container_name", "op": "eq", "value": "liaison" } } +{ "id": "cold_tier_note", "visibleWhen": { "kind": "entity", "attribute": "node_type", "op": "eq", "value": "cold" } } ``` This is the precise, declarative form, and it is the natural way to express tier-specific panels (a -`hot` data node merges constantly; a `cold` node is mostly static). +`hot` data container merges constantly; a `cold` container is mostly static). The landed gate supports +`exists` and case-insensitive `eq`; tier *sets* need the proposed `in` operator — until it lands they +are expressible as duplicated `eq`-gated widget variants. Role/tier scoping of the catalog: -| Bucket | Panels | Predicate | -| --------------- | --------------------------------------------------------------------- | --------------------------------- | -| **All roles** | system resources, disk-by-path, network, Go runtime, node uptime | (always shown) | -| **Liaison** | gRPC query & errors, non-query ops, write rate, publish throughput & latency, wqueue depth | `#entity.node_role == 'liaison'` | -| **Data** | storage totals, merge/compaction, inverted index, subscribe queue | `#entity.node_role == 'data'` | -| **Data + tier** | tier-specific merge/retention hints | `#entity.node_type in (hot,warm)` | +| Bucket | Panels | Entity gate | +| --------------- | --------------------------------------------------------------------- | ---------------------------------- | +| **All roles** | system resources, disk-by-path, network, Go runtime, node uptime | (always shown) | +| **Liaison** | gRPC query & errors, non-query ops, write rate, publish throughput & latency, wqueue depth | `container_name eq liaison` | +| **Data** | storage totals, merge/compaction, inverted index, subscribe queue, retention | `container_name eq data` | +| **Data + tier** | tier-specific merge/retention hints | `node_type in (hot, warm)` † | +| **Lifecycle** | migration cycles, last-run time + status | `container_name eq lifecycle` | + +† `in` is the proposed extension of [section 6](#6-horizon-ui-enhancement-entity-gate-membership-operators); +until it lands, two `eq`-gated widget variants. ### 5. Dashboards (Horizon UI BANYANDB layer template) A net-new layer template `apps/bff/src/bundled_templates/layers/banyandb.json` (config-driven JSON, one -file per layer, per-scope widget arrays, MQE expression strings). The design mirrors the upstream two -boards across the SkyWalking hierarchy: +file per layer keyed by its `key` field — `BANYANDB`, filename lowercased — with per-scope widget +arrays and MQE expression strings). One menu touchpoint exists: Horizon UI currently hard-codes the +`BANYANDB` layer out of the sidebar (`HIDDEN_LAYERS`); horizon-ui PR #47 replaces that with a +config-driven `layers.excluded` list that un-hides BanyanDB — this SWIP rides on that change (or an +equivalent one-line un-hide). The design mirrors the upstream two boards across the SkyWalking +hierarchy: ``` BANYANDB layer -├─ Root → cluster list (ServiceList), showGroup=false +├─ Root → cluster list (the layer landing's service-list picker: header columns + sort) ├─ Service (cluster) │ └─ Overview KPIs + "Cluster Workload Summary" + "Fleet Overview" capacity │ (cluster_write_rate, cluster_query_rate, cluster_error_rate, │ reporting_nodes by role, total_cpu/memory/disk) -├─ Instance (node) ← the "Nodes" board, made dynamic +├─ Instance (container) ← the "Nodes" board, made dynamic; instance = pod_name@container_name │ ├─ All roles: Resources (CPU/RSS/mem%/disk%), Disk by Path, Network, Go Runtime -│ ├─ Liaison (visibleWhen role==liaison): Ingestion/Query, Registry, Errors, +│ ├─ Liaison (entity gate container_name eq liaison): Ingestion/Query, Registry, Errors, │ │ Publish throughput & p99, Write Queue (wqueue) depth -│ └─ Data (visibleWhen role==data): Storage totals, Merge, Inverted Index, -│ Subscribe Queue (per operation: query/file-sync/batch-write/control) +│ ├─ Data (entity gate container_name eq data): Storage totals, Merge, Inverted Index, Retention, +│ │ Subscribe Queue (per operation: query/file-sync/batch-write/control) +│ └─ Lifecycle (entity gate container_name eq lifecycle): migration cycles, last-run time + status └─ Endpoint (group) ← the "Workload" board, by group └─ Write rate, Query latency, Total data, Merge, Inverted index, Queue, Publish bytes ``` Panel **types/units** follow the upstream Grafana boards for fidelity (stat for KPIs; timeseries for -rates/latencies; table for the per-node health row; `bytes` / `percentunit` / `s` / `reqps` / `wps` -units; disk% and memory% turn red at 80%). The per-node "health table" (uptime, CPU cores, RSS, mem%, -disk%) becomes the Instance-list columns on the Service view. +rates/latencies; `bytes` / `percentunit` / `s` / `reqps` / `wps` units; disk% and memory% turn red at +80%). The upstream per-node "health table" (uptime, CPU cores, RSS, mem%, disk%) maps onto the +all-roles Resources widgets of the Instance view — Horizon UI's instance list deliberately shows only +name + attributes (the role/tier chips), and per-instance metric columns are not assumed by this +design; if embedded health columns prove necessary later, that is an additive Horizon UI enhancement. This is **design only** — the production `banyandb.json` and its exact widget grid are deliberately left to the implementation PR in the Horizon UI repository. -### 6. Horizon UI enhancement: `#entity` attribute predicate +### 6. Horizon UI enhancement: entity-gate membership operators -Horizon UI's widget `visibleWhen` already parses two predicate forms but only one is implemented: +When this SWIP was first drafted, Horizon UI parsed `visibleWhen` as free text and stubbed the +entity-attribute form. That is no longer the upstream state: horizon-ui PR #46 (merged 2026-06-08) +replaced the free-text parser with a structured, **BFF-evaluated** union — -- `"<metric> has value"` — implemented (client-side data-presence gating). -- `"#entity.<key>"` — **parsed but stubbed**: the renderer's `isVisible` currently returns `true` - unconditionally for any `#entity.*` predicate, with the comment *"Entity-attribute predicates need an - attributes feed we don't surface yet. Render the widget unconditionally for now."* +- `{ "kind": "mqe", "expression": "<expr>", "op": "exists" }` — data-presence gating; +- `{ "kind": "entity", "attribute": "<key>", "op": "exists" }` / + `{ "kind": "entity", "attribute": "<key>", "op": "eq", "value": "<v>" }` — entity-attribute gating + against the selected instance's attribute feed (`eq` compares case-insensitively; meaningful on the + Instance scope only, a no-op elsewhere) -The data is already on the wire: the instance list the UI fetches carries each instance's -`attributes [{name,value}]`. The enhancement is to **wire those attributes into the predicate -evaluator** and give the predicate a small comparison grammar: +— so the attribute feed and the evaluator this section originally proposed **already exist upstream**: +the BFF fetches the selected instance's `attributes [{name,value}]` and returns gated-out widgets as +hidden. Legacy free-text predicates (`"<metric> has value"`, `"#entity.<key>"`) are no longer parsed +and degrade to ungated. -| Predicate form | Meaning | -| --------------------------------------- | -------------------------------------------------- | -| `#entity.<key>` | attribute present and truthy | -| `#entity.<key> == '<v>'` / `!= '<v>'` | equals / not-equals a literal | -| `#entity.<key> in (<v1>,<v2>)` | membership | +What remains for this design is only **membership and negation**: -Scope of the enhancement (design): (1) pass the selected instance's `attributes` into the -`LayerDashboardsView` predicate context; (2) implement the `#entity.*` branch of `isVisible` to read -that context; (3) extend the predicate parser with `==` / `!=` / `in`; (4) document it in the Horizon UI -layer-template authoring docs. It is generic — any layer (K8s node roles, gateway tiers, …) benefits; +| Proposed gate | Meaning | +| ------------------------------------------------------------------------------------ | -------------------- | +| `{ "kind": "entity", "attribute": "<key>", "op": "neq", "value": "<v>" }` | not-equals a literal | +| `{ "kind": "entity", "attribute": "<key>", "op": "in", "values": ["<v1>", "<v2>"] }` | membership | + +Scope of the enhancement (design): (1) add the two operator arms to the BFF `visibleWhen` schema and +its entity-gate evaluator; (2) document them in the Horizon UI layer-template authoring docs. Until it +lands, a tier set like `node_type in (hot, warm)` is expressible as two `eq`-gated widget variants — +`in` removes the duplication. It is generic — any layer (K8s node roles, gateway tiers, …) benefits; BanyanDB is the first consumer. The exact code lands in the Horizon UI repository. +### 7. Intra-cluster instance topology (the "deployment" component) + +Beyond the per-instance dashboards, the BanyanDB layer adds a **deployment view**: the +container-to-container call graph *within* the single BanyanDB cluster service — liaison↔data writes, +the hot→warm→cold lifecycle migration chain, and inter-liaison gossip. The legacy booster UI only ever +drew instance topology *between two services*; this is a net-new Horizon UI component for the +**one-service** case (landing via horizon-ui PR #47). + +**Data path — no query API change.** The component calls +`getServiceInstanceTopology(clientServiceId, serverServiceId, duration)` with the **same** service id +on both sides. OAP's relation filter is symmetric, so `client == server == svc` collapses to +`source_service_id == dest_service_id == svc`, returning exactly the intra-cluster instance relations +(verified across the BanyanDB / JDBC / ES topology DAOs). Per-node metrics evaluate under +`{ scope: ServiceInstance }`; per-edge metrics under `ServiceInstanceRelation` (server + client +families) — both ordinary MQE. + +**Grouping contract.** The component lays the graph out from the instance attributes this SWIP emits +([entity model](#1-entity-model)): + +| Config key | Attribute(s) | Effect | +| ----------- | ------------------------- | -------------------------------------------------------------- | +| `clusterBy` | `node_role` + `node_type` | one box per role/tier — liaison, data hot/warm/cold | +| `siblingBy` | `pod_name` | a pod = main container + sibling containers (data + lifecycle) | +| `roleBy` | `container_name` | per-role node metrics (`liaison` / `data` / `lifecycle`) | + +Per-role node MQE binds to the `meter_banyandb_instance_*` metrics from the catalog above — e.g. +liaison → `query_rate_by_service`, data → `write_rate` / `disk_usage_percent`, lifecycle → +`lifecycle_cycles` / `lifecycle_last_run_success`. Only `container_name` ∈ +{`liaison`, `data`, `lifecycle`} exists on the wire — there is **no `fodc` container** (the FODC agent +publishes no self-metrics through the proxy), so a `fodc` role is not modeled. + +**Open dependency — a MAL `SERVICE_INSTANCE_RELATION` scope.** This feature is MAL-only: every BanyanDB +entity, metric, and attribute here is produced by the `banyandb/*` MAL rules. MAL builds relations +through `MeterEntity` / `ScopeType`, which ships `SERVICE_RELATION` and `PROCESS_RELATION` (the latter +already powers the eBPF process topology via `network-profiling.yaml`) — but it has **no +`SERVICE_INSTANCE_RELATION` scope** and no `SampleFamily.instanceRelation(...)` builder. So MAL cannot +emit the instance-relation metric that `getServiceInstanceTopology` reads, and on a metrics-only +BanyanDB the deployment graph is **empty** — the Horizon UI component (PR #47) renders that empty +state by design until the scope lands (its earlier preview mock has been dropped). +Closing the gap means adding that third relation scope (a `SERVICE_INSTANCE_RELATION` `ScopeType` + +`MeterEntity` factory + `instanceRelation(...)` builder + entity description, mirroring the two that +ship), fed by the queue `remote_node` / `remote_role` / `remote_tier` labels (now carrying the +lifecycle sender identity per BanyanDB #1167). That is MAL-**engine** code (`server-core` + +`meter-analyzer`), which exceeds this SWIP's [config-only non-goals](#non-goals); it is tracked under +[future work](#future-work). The component, the query path, and the grouping contract above are ready +the moment that scope lands. + ## Feasibility and precedent Verified against the OAP and Horizon UI source — **no OAP core / MAL / receiver change is required**: @@ -367,21 +517,34 @@ Verified against the OAP and Horizon UI source — **no OAP core / MAL / receive `EndpointTraffic` whenever the endpoint name is non-empty; `EndpointTraffic` is `supportUpdate=true` and is listed by GraphQL `findEndpoint` (empty keyword ⇒ list all), which the BanyanDB metadata DAO serves from the traffic table without touching any trace data. -- **Layer.** `Layer.BANYANDB` (ordinal 43) already exists; layer dashboards are auto-discovered by the - UI from the template's own `layer` field — no menu code change. +- **Layer.** `Layer.BANYANDB` (ordinal 43) already exists; layer dashboards are auto-discovered from + the template's own `key` field. The one menu touchpoint: Horizon UI's hard-coded hidden-layers set + currently drops `BANYANDB` from the sidebar — un-hidden by horizon-ui PR #47's config-driven + `layers.excluded` (see [Dashboards](#5-dashboards-horizon-ui-banyandb-layer-template)). ## Live validation The entity scheme and the metric catalog above were validated against a **live 7-node BanyanDB cluster** — the public SkyWalking demo's FODC proxy `/metrics` (2 liaison + 5 data: `hot×2`, `warm×2`, -`cold×1`). Findings: - -- **All four identity labels are present and exactly as designed.** Every sample carries `pod_name` - (e.g. `demo-banyandb-data-hot-0`), `node_role` (`ROLE_LIAISON` / `ROLE_DATA`), `container_name` - (`liaison` / `data`), and — on **data nodes only** — `node_type` (`hot` / `warm` / `cold`). Liaison - nodes carry no `node_type`, so the instance closure defaults the tier attribute (`tags.node_type ?: - 'n/a'`). This validates Service = `cluster`, Instance = `pod_name`, attributes `node_role` / - `node_type`. +`cold×1`), running an upstream `main` build (the showcase-pinned image of 2026-06-09; upstream PR +[#1159](https://github.com/apache/skywalking-banyandb/pull/1159) — open, docs and Grafana dashboards +only — documents the same catalog). The live `/metrics` pull is the authoritative wire reference. +393 metric families. Findings: + +- **Instance must be `pod_name` + `container_name`, not `pod_name`.** Every sample carries `pod_name`, + `node_role` (`ROLE_LIAISON` / `ROLE_DATA` observed; the FODC agent stamps a transient + `ROLE_UNSPECIFIED` for unresolved or meta-only nodes), `container_name` + (`liaison` / `data` / **`lifecycle`**), and — on **data containers only** — `node_type` + (`hot` / `warm` / `cold`). Crucially, the four `data` hot/warm pods each run **two containers under + one `pod_name`** (`…@data` and `…@lifecycle`), so `pod_name` is not a unique instance key and + `node_role` is not the discriminator (it reads `ROLE_DATA` on the lifecycle sidecar). This validates + Service = `cluster`, Instance = `pod_name` + `container_name`, attributes `container_name` / `node_type`. +- **The `lifecycle` migrator surfaces as its own container instance.** It co-locates on the `hot`/`warm` + data pods and emits `banyandb_lifecycle_cycles_total` plus the shared `system_*` / `go_*` / + `process_*` runtime families — 50 families under `container_name=lifecycle` in the demo pull. The + `last_run_timestamp_seconds` / `last_run_success` gauges (BanyanDB #1167) post-date the demo's + deployed build, so they were absent from that pull but are present on `main` and emit once a migration + cycle runs (the showcase has since pinned the #1167 merge SHA, so a redeployed demo will expose them). - **The queue model is confirmed verbatim.** `banyandb_queue_sub_*` / `queue_pub_*` carry `operation` ∈ {`batch-write`, `control`, `file-sync`, `query`}, plus `group`, `remote_node`, `remote_role` (`liaison` / `data`) and `remote_tier` (`hot` / …); `total_latency` is a histogram. The @@ -391,16 +554,24 @@ cluster** — the public SkyWalking demo's FODC proxy `/metrics` (2 liaison + 5 `liaison_grpc_total_started{group,method,service}`, `*_total_written{group}`, `*_inverted_index_*{group,seg,node_type}`. Data-node metrics also carry `node_type`, so the by-group endpoint view can be refined by tier. -- **One reconciliation vs. the upstream doc.** Schema/registry operations are **not** exposed as - `banyandb_liaison_grpc_total_registry_*` (those series do not exist on the live cluster) — they are a - **separate `banyandb_schema_server_grpc_*` scope** (`total_started{method}`, `_finished`, `_latency`, - `_err`), running on the nodes hosting the metadata/schema server. The tables above use the - `schema_server_grpc_*` names accordingly. +- **Two registry/schema scopes coexist (corrected).** The live cluster exposes **both** + `banyandb_liaison_grpc_total_registry_*` (`group`, `service`, `method`; on liaison containers) **and** + a separate `banyandb_schema_server_grpc_*` scope (`total_started{method}`, `_finished`, `_latency`, + `_err`; on the data container hosting the metadata/schema server). The `cluster_error_rate` and + registry panels should pick one deliberately — they are different layers, not aliases. (An earlier + draft claimed the `liaison_grpc_total_registry_*` series were absent; BanyanDB `main` has emitted + them since #517.) +- **`storage_retention_*` is a real data-only family** not in earlier drafts: + `storage_retention_{measure,stream,trace}_disk_usage_percent{service}` and + `_forced_retention_cooldown_seconds{service}` — the source for the data-container retention panels. - **Error counters are absent on a healthy cluster, by design.** `liaison_grpc_total_err`, - `*_total_sync_loop_err` and `queue_pub_total_err` are label-dimensioned counters that emit no series - until the first error — so the rules must guard each error term with `or vector(0)`, exactly as the - upstream "Error Rate" panel does. Their non-error siblings (`_started` / `_finished` / `_latency` / - `_bytes`) are all present. + `liaison_grpc_total_stream_msg_received_err`, `*_total_sync_loop_err` and `queue_pub_total_err` are + label-dimensioned counters that emit no series until the first error. The upstream Grafana "Error + Rate" panel guards each term with PromQL's `or vector(0)`; the MAL rules need no guard — an absent + family is the identity for MAL's `+` (see the sketch-notation note in + [the metric catalog](#3-metric-catalog--mal-rules)) — the summed metric simply has no series until + the first error fires. Their non-error siblings (`_started` / `_finished` / `_latency` / `_bytes`) + are all present. ## Imported Dependencies libs and their licenses @@ -415,10 +586,11 @@ This is a **breaking change** to the BanyanDB self-observability feature (an int feature, not a public protocol/storage contract): - **Entity model.** A BanyanDB cluster that previously appeared as *N* services (one per node) now - appears as *one* service with *N* instances. Old per-node `Service` entities and their - `meter_banyandb_*` / `meter_banyandb_instance_*` metric series are superseded; the new series use the - cluster/node/group identities and a partly new metric set. Historical data under the old model is not - migrated. + appears as *one* service with one instance **per container** (`pod_name` + `container_name`, so a + data hot/warm pod yields both a `data` and a `lifecycle` instance). Old per-node `Service` entities + and their `meter_banyandb_*` / `meter_banyandb_instance_*` metric series are superseded; the new + series use the cluster/container/group identities and a partly new metric set. Historical data under + the old model is not migrated. - **Scrape target.** Cluster deployments must scrape the **FODC proxy `:17913`** (single target) and inject a `cluster` label. The legacy per-pod `:2121` collector config is replaced. Direct per-pod scraping is **out of scope** for this redesign (a standalone node still reports through its FODC @@ -430,9 +602,9 @@ feature, not a public protocol/storage contract): Horizon UI bundle. - **OAP rule loading** is unchanged: `enabledOtelMetricsRules` already globs `banyandb/*`, so the new `banyandb-endpoint.yaml` is picked up without an `application.yml` change. -- **Horizon UI predicate enhancement is backward compatible** — `#entity.*` only ever returned `true` - before, so implementing it can only *add* hiding behavior to templates that opt in; existing templates - are unaffected. +- **The Horizon UI entity-gate extension is backward compatible** — `neq` / `in` are additive arms of + the structured `visibleWhen` union (horizon-ui #46); templates that don't use them are unaffected, + and legacy free-text predicates already degrade to ungated rather than erroring. ## General usage docs @@ -451,17 +623,25 @@ This is a preliminary usage sketch to help reviewers; the final operator docs (r **What the operator sees** - A **cluster** as a single service, with cluster-wide write/query/error rates and capacity. -- A **node list** where each node shows its **role** (`liaison` / `data`) and **tier** - (`hot` / `warm` / `cold`) as attributes; selecting a node shows a dashboard **scoped to what that node - actually does** — ingestion/queue/publish for liaison, storage/index/subscribe for data, refined by - tier. +- An **instance (container) list** where each entry shows its **container** role + (`liaison` / `data` / `lifecycle`) and **tier** (`hot` / `warm` / `cold`) as attributes; selecting one + shows a dashboard **scoped to what that container actually does** — ingestion/queue/publish for + liaison, storage/index/subscribe/retention for data, migration cycles + last-run time/status for + lifecycle, refined by tier. - A **group list** (Endpoints) with per-group throughput, latency, storage, index and queue health. ## Future work -- **Topology / lifecycle.** Fuse FODC `/cluster/topology` (node inventory + roles + tiers) and the queue - `remote_node` / `remote_role` / `remote_tier` labels into a node-to-node call graph, and surface FODC - `/cluster/lifecycle` group settings (shards / segment interval / TTL) on the Endpoint view. +- **A MAL `SERVICE_INSTANCE_RELATION` scope for the deployment component.** Add the third relation scope + (`ScopeType` + `MeterEntity` factory + `SampleFamily.instanceRelation(...)` + entity description, + mirroring the shipping `serviceRelation` / `processRelation`) so the + [intra-cluster instance topology](#7-intra-cluster-instance-topology-the-deployment-component) renders + live instead of mock-backed, fed by the queue `remote_node` / `remote_role` / `remote_tier` labels + (verified reconstructable from the live data; BanyanDB #1167 also populates the lifecycle migration + sender identity, so hot→warm→cold tier-migration edges are distinguishable). This is MAL-engine code, + beyond this SWIP's config-only scope. Also + surface FODC `/cluster/topology` and `/cluster/lifecycle` group settings (shards / segment interval / + TTL) on the Endpoint view. - **Alerting.** Ship default alarm rules for the upstream "Key Signals to Watch" (query p99, error rate, disk > 85%, memory near the protector limit, sustained wqueue / `queue_pub` backlog). - **Direct-scrape variant** for standalone / non-FODC deployments, if demand warrants.
