[PR] perf(trace, stream): faster point-lookup queries via lazy block-metadata decode [skywalking-banyandb]

via GitHub Wed, 17 Jun 2026 20:55:28 -0700


hanahmily opened a new pull request, #1182:
URL: https://github.com/apache/skywalking-banyandb/pull/1182


   ## Summary
   
   Performance work on the trace/stream query read path, the distributed 
benchmark used to
   measure it, and a fix for the replicated-schema integration setup.
   
   The headline change makes **point-lookup queries decode only the block 
metadata they need**.
   Previously a query decoded *every* `blockMetadata` entry of a ~128KB 
primary-block granule
   (building each entry's tag/column map) just to use the few entries whose key 
was requested.
   Both the trace iterator (keyed by `traceID`) and the stream iterator (keyed 
by `seriesID`)
   already locate the wanted entries via binary search, but `findBlock` only 
ever returns entries
   whose key is in the queried set — so decoding the rest is wasted work. We 
now merge-walk the
   sorted granule against the sorted query keys and fully decode only the 
matched entries, cheaply
   skipping the others without allocating their maps. The result is 
byte-identical to the full
   decode filtered to the queried keys (locked by equivalence unit tests); 
migration paths keep
   the full decode.
   
   ## Changes
   
   - **`perf(trace)`** — speed up the `trace_by_id` path: skip the wildcard 
series-index resolution
     for trace-id queries (its only consumer early-returns when `TraceIDs` is 
set); decode the
     on-disk columnar block directly into per-trace results in the vectorized 
path (bypassing the
     `blockCursor` middle format and the now-removed Phase-2 operator 
pipeline); and lazily decode
     primary-block metadata (`unmarshalBlockMetadataFiltered`).
   - **`perf(stream)`** — apply the same lazy primary-block-metadata decode to 
the stream query path
     (keyed by `seriesID`). *Not* applied to measure, whose `readPrimaryBlock` 
caches the full
     decoded granule across queries (a filtered decode would corrupt the shared 
cache, and the cache
     already amortizes the cost).
   - **`test(querybench)`** — distributed trace query benchmark harness 
(`trace_by_id`,
     `trace_tag_filter`) with a real-world cardinality matrix, Docker resource 
limits, CPU/heap
     profiling, and merged JSON/Markdown reports.
   - **`test(replication)`** — add the missing `sw_cross_segment*` groups to 
the replicated
     measure/stream/trace testdata. The replicated loaders create a curated 
group set then load the
     standard resources (which include cross-segment fixtures); those groups 
were never mirrored, so
     the property-schema preload failed with `NotFound` in `BeforeSuite`.
   
   ## Benchmark (same-session A/B, Docker 4cpu/8g, 300 iters/30 warmup)
   
   `trace_by_id`, baseline → lazy-decode:
   
   | metric | 100k row | 100k vec | 1M row | 1M vec |
   | --- | --- | --- | --- | --- |
   | p50 ms | 3.05 → 1.86 | 2.96 → 2.19 | 3.18 → 2.06 | 3.65 → 2.42 |
   | mallocs/query | 16944 → 6278 | 23935 → 5913 | 16761 → 5853 | 16374 → 6088 |
   | QPS | 1034 → 1632 | 1095 → 1454 | 965 → 1523 | 877 → 1278 |
   
   CPU profile: the `unmarshalBlockMetadata` frame drops from ~18–29% cum to 
~0; `readPrimaryBlock`
   is then just zstd-decompress + disk read. `trace_tag_filter` (broad scan) 
shows no regression.
   The win scales with selectivity (biggest for few-key lookups, a wash for 
full scans) and is shared
   by row and vec, since the dominant point-lookup cost is the engine-agnostic 
metadata decode.
   
   ## Testing
   
   - `make check` + `make lint` (full `.golangci.yml`) — clean.
   - Unit: `banyand/trace`, `banyand/stream`, `pkg/query/vectorized/trace` — 
pass. New equivalence
     tests `Test_unmarshalBlockMetadataFiltered` (trace + stream) lock the 
filtered-vs-full decode.
   - Integration: `distributed/query` and `standalone/query` (full suites) 
pass; `replication` passes
     in isolation (the `BeforeSuite` preload now succeeds).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] perf(trace, stream): faster point-lookup queries via lazy block-metadata decode [skywalking-banyandb]

Reply via email to