hanahmily opened a new pull request, #1162:
URL: https://github.com/apache/skywalking-banyandb/pull/1162

   ### Problem
   
   Follow-up to #1161. After the pub/sub metrics redesign, some 
`banyandb_queue_sub_total_started` / `banyandb_queue_sub_total_finished` series 
carried an **empty `group`** label. On the live cluster this was isolated to 
`operation="batch-write"` — and only a subset of it (plain writes kept their 
group).
   
   ### Root cause
   
   `batch-write` spans two payload families:
   
   - `TopicStream/Measure/TraceWrite` publish an `InternalWriteRequest` 
(`proto.Message`) → `messageToRequest` resolves the group via 
`GroupFromMessageData`. ✅
   - The five **secondary-index** sync topics 
(`MeasureSeriesIndexInsert/Update`, `StreamSeriesIndexWrite`, 
`StreamLocalIndexWrite`, `TraceSidxSeriesWrite`) publish **pre-marshaled 
`[]byte`** payloads. The `[]byte` branch of `messageToRequest` set `r.Body` but 
never `r.Group`, so `SendRequest.Group` went out empty and the subscriber 
(which mirrors `writeEntity.GetGroup()`) recorded `group=""`.
   
   The group name is actually known at the publish site (`g.name`) — it was 
only embedded inside the opaque body bytes, not surfaced on the wire field the 
metrics layer reads.
   
   ### Fix
   
   Thread the business group out-of-band on the bus message:
   
   - `pkg/bus`: add `Message.group` + `Group()` getter and 
`NewMessageWithNodeAndGroup(id, node, group, data)`.
   - `banyand/queue/pub/pub.go`: `messageToRequest` sets `r.Group = m.Group()` 
on the `[]byte` branch; `Broadcast` preserves `messages.Group()` when fanning 
out per node.
   - The four secondary-index publish sites pass `g.name`.
   
   The subscriber needs no change — once the wire field is populated it labels 
correctly.
   
   ### Verification
   
   - New unit test `TestMessageToRequestGroup` (proto→group-from-metadata, 
`[]byte`+group→set, `[]byte` without group→empty).
   - Local CI green on this change: `make build` / `make lint` (0 issues) / 
`make license-check` / `make check` (clean tree); affected-package tests with 
`-race` (`pkg/bus`, `banyand/queue/pub`, `banyand/queue/sub`, `api/data`) all 
pass.
   - **Live cluster**: after rolling the liaison, the empty-group `batch-write` 
increase dropped from ~3479/10min to **0**, and all increments now carry their 
real group (`sw_metricsMinute`, `sw_trace`, …).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to