hanahmily opened a new pull request, #1162: URL: https://github.com/apache/skywalking-banyandb/pull/1162
### Problem Follow-up to #1161. After the pub/sub metrics redesign, some `banyandb_queue_sub_total_started` / `banyandb_queue_sub_total_finished` series carried an **empty `group`** label. On the live cluster this was isolated to `operation="batch-write"` — and only a subset of it (plain writes kept their group). ### Root cause `batch-write` spans two payload families: - `TopicStream/Measure/TraceWrite` publish an `InternalWriteRequest` (`proto.Message`) → `messageToRequest` resolves the group via `GroupFromMessageData`. ✅ - The five **secondary-index** sync topics (`MeasureSeriesIndexInsert/Update`, `StreamSeriesIndexWrite`, `StreamLocalIndexWrite`, `TraceSidxSeriesWrite`) publish **pre-marshaled `[]byte`** payloads. The `[]byte` branch of `messageToRequest` set `r.Body` but never `r.Group`, so `SendRequest.Group` went out empty and the subscriber (which mirrors `writeEntity.GetGroup()`) recorded `group=""`. The group name is actually known at the publish site (`g.name`) — it was only embedded inside the opaque body bytes, not surfaced on the wire field the metrics layer reads. ### Fix Thread the business group out-of-band on the bus message: - `pkg/bus`: add `Message.group` + `Group()` getter and `NewMessageWithNodeAndGroup(id, node, group, data)`. - `banyand/queue/pub/pub.go`: `messageToRequest` sets `r.Group = m.Group()` on the `[]byte` branch; `Broadcast` preserves `messages.Group()` when fanning out per node. - The four secondary-index publish sites pass `g.name`. The subscriber needs no change — once the wire field is populated it labels correctly. ### Verification - New unit test `TestMessageToRequestGroup` (proto→group-from-metadata, `[]byte`+group→set, `[]byte` without group→empty). - Local CI green on this change: `make build` / `make lint` (0 issues) / `make license-check` / `make check` (clean tree); affected-package tests with `-race` (`pkg/bus`, `banyand/queue/pub`, `banyand/queue/sub`, `api/data`) all pass. - **Live cluster**: after rolling the liaison, the empty-group `batch-write` increase dropped from ~3479/10min to **0**, and all increments now carry their real group (`sw_metricsMinute`, `sw_trace`, …). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
