hanahmily opened a new pull request, #1129:
URL: https://github.com/apache/skywalking-banyandb/pull/1129
## Summary
- Lands the vectorized measure query subsystem (G1–G8) on a new analyzer +
plan tree (no leaf substitution into the row plan), wired into
`banyand/query/processor.go` ahead of the row-path `logical_measure.Analyze`.
- Coverage at merge: full scan parity (passthrough columns, zero-alloc
egress) plus single-node `GroupBy+Agg` (SUM/COUNT/MIN/MAX/MEAN) with
row-path-equivalent FieldValue oneof types and first-seen carry-forward of
non-key projected tags.
- `--measure-vectorized-enabled` is **default on**. Rollback is one flag
flip + restart; the row path resumes immediately. The intermediate
`--measure-vectorized-aggregation-enabled` flag is removed — `GroupBy+Agg`
dispatch is no longer separately gated.
- Distributed Map-mode `GroupBy+Agg`, TopN, requests with `order_by`, and
requests with hidden criteria tags continue to flow through the row path
(eligibility gate documented in
`pkg/query/vectorized/measure/plan/dispatch.go`).
## Highlights
- Columnar `RecordBatch` pipeline (`pkg/query/vectorized`) with passthrough
`*modelv1.{Tag,Field}Value` columns by default to preserve the row-path's
zero-alloc egress on the frozen gRPC wire format; native typed columns are
emitted only for the `GroupBy` keys and `Agg` field the operator reduces over.
- Storage emits native typed columns via `MeasureBatchResult.PullBatch`
(`banyand/measure/query.go` + `batch_decode.go`); the vec
`BatchSourceFromBatchResult` consumes them without going through the row-path
decode pass.
- `BatchAggregation` operator
(`pkg/query/vectorized/measure/aggregation.go`) folds via
`pkg/query/aggregation.Map[N]`, keeping numeric semantics in lockstep with the
row path. Non-key projected tags are captured per group (first seen) so output
matches `measure_plan_aggregation.go`.
- G8d schema bridge: `plan.Analyze` and `dispatch.Dispatch` thread
`GroupBy/Agg` into the same `BuildBatchSchema` call storage uses, so the
operator-facing schema and storage's `result.batchSchema` agree on column types.
- Distributed `emitPartial=true` requests fall through in `tryVecDispatch`
because vec implements only `AggModeAll`; the row path's Map/Reduce split keeps
distributed correctness.
## Test plan
- [x] `make license-check && make build && make lint && make check` (no
drift)
- [x] `make test-ci PKG=./banyand/...` — 26 suites pass
- [x] `make test-ci PKG=./pkg/...` — 39 suites pass
- [x] `make test-ci PKG=./bydbctl/...` — 88/88 specs pass
- [x] `make test-ci PKG=./fodc/...` — 15/16 specs pass (1 skipped)
- [x] `make test-ci PKG=./test/integration/standalone/...` — 12 suites pass
including the 488-spec vectorized parity gate
(`--measure-vectorized-enabled=true`)
- [x] `make test-ci PKG=./test/integration/distributed/...` — 11 suites pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]