(skywalking-banyandb) 11/11: Prepare for the v0.10.3 release

butterbright Fri, 05 Jun 2026 23:08:46 -0700

This is an automated email from the ASF dual-hosted git repository.

ButterBright pushed a commit to branch v0.10.x
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git


commit f45404a6be0a4667c81d43542fd86601e2c536e0
Author: ButterBright <[email protected]>
AuthorDate: Sat Jun 6 14:08:28 2026 +0800

    Prepare for the v0.10.3 release
---
 CHANGES.md               | 38 +++++++++++++++++++++-----------------
 banyand/trace/metrics.go |  1 -
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index c6d75b767..ba6f48272 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -6,25 +6,43 @@ Release Notes.
 
 ### Bug Fixes
 
+- Persist segment end time in per-segment metadata so boundaries don't shift 
across restarts or config changes.
+- Fix flaky on-disk integration tests caused by Ginkgo v2 random container 
shuffling closing gRPC connections prematurely.
+- ui: fix query editor refresh/reset behavior and BydbQL keyword highlighting.
+- Fix flaky `file_snapshot` subtest in measure/stream/trace by waiting until 
every introduced mem part has been flushed to disk, instead of only checking 
the latest snapshot creator.
+- Fix flaky `TestCollectWithPartialClosedSegments` by raising 
`SegmentIdleTimeout` so wall-clock variance on slow CI does not mark still-open 
segments as idle.
+- Fix FODC lifecycle cache poisoning where transient `InspectAll` failures 
were cached for 10 minutes and masked liaison recovery; raise FODC agent and 
proxy timeouts from 10s to 40s.
+- Fix FODC `/cluster/lifecycle` dropping zero-valued group fields (e.g. 
`replicas=0`, `close=false`) under `encoding/json` + `omitempty`; switch to 
`protojson` so all fields are emitted (nil nested messages serialize as `null`).
+- Fix trace `block_writer` panic on out-of-order timestamps within the same 
traceID, which dropped one trace-write batch per panic in multi-agent 
SkyWalking deployments. Spans of a single trace originate from 
independently-clocked services, and trace storage is organized by traceID 
rather than timestamp, so per-traceID timestamp monotonicity is not a writer 
invariant.
+- Fix nil-pointer panic on cold-tier data nodes when FODC `InspectAll` raced 
with idle-segment cleanup.
+- Add `GroupLifecycleInfo.errors` to surface per-group collection failures 
from FODC `InspectAll` instead of silently dropping the affected node entry.
+- Fix `CollectDataInfo` and `CollectLiaisonInfo` not handling 
`CATALOG_PROPERTY` groups.
+- Close BanyanDB merge write-path durability gap that allowed torn parts to be 
created by a crash between data write and metadata commit. Metadata files 
(`metadata.json` for trace/measure/stream, `manifest.json` for sidx, plus 
`traceID.filter` and `tag.type`) now go through a new `WriteAtomic` (write-tmp 
+ fsync + rename + fsync-dir) sequence; data writers (`seqWriter.Close`, 
`localFileSystem.Write`) now propagate fdatasync errors instead of silently 
dropping them. `mustOpenFilePart` / ` [...]
+- Fix lifecycle migration where the receiving node could create segments 
shorter than the configured `SegmentInterval`.
+- Fail fast on incompatible storage version at boot. Previously the server 
would start in a degraded `SERVING` state with affected groups un-loaded 
because the property schema-registry retry loop swallowed the 
version-incompatibility panic. Compatible versions are listed in 
`banyand/internal/storage/versions.yml`.
+- Release bluge index writers on segment rotation so `analysisWorker` pools 
sized from `GOMAXPROCS` don't accumulate across rotations. Two layered defects 
kept the existing idle-segment reclaim path from running: `segmentIdleTimeout` 
defaulted to `0` (which disabled the 10-minute reclaim ticker), and `incRef` 
refreshed `lastAccessed` on every rotation tick so `closeIdleSegments` never 
observed an idle segment. Defaults to `time.Hour`, moves the `lastAccessed` 
bump to real read/write call [...]
+- Fix incorrect counts and missing trace fields in the lifecycle migration 
report.
 - Fix trace query identity-tag projection: when `trace_id`/`span_id` are 
explicitly projected, reconstruct them from span identity at response build 
time instead of requesting them as stored tags, and preserve tag order with 
null-filled per-span value alignment in the distributed trace result iterator.
 - Fix FODC proxy corrupting Prometheus metric types. The agent dropped the `# 
TYPE` line while parsing banyandb `/metrics`, the `StreamMetrics` proto carried 
no type field, and the proxy guessed the type from a name-suffix heuristic — 
downgrading counters to gauge, mislabeling `_count`-suffixed counters as 
histograms, and splitting summaries into two conflicting `# TYPE` lines. 
Capture the type with the Prometheus `expfmt` parser, store it in the flight 
recorder, thread it through a new  [...]
 - Trace storage metrics now expose the `storage` sub-scope, matching the 
`stream_storage_*` naming. The `StorageMetricsFactory` for trace switched from 
the root `trace` scope to `trace.storage`, so per-segment inverted-index 
metrics (`inverted_index_total_updates`, `inverted_index_total_doc_count`, 
`inverted_index_total_term_searchers_started`) are now emitted as 
`banyandb_trace_storage_*` instead of `banyandb_trace_*`, aligning the 
dashboard query names. Other trace metrics (`trace_tst_ [...]
 - Fix FODC agent labeling metrics with `node_role="ROLE_UNSPECIFIED"`. The 
agent resolved the node role exactly once at startup via a single 
`GetCurrentNode` poll whose endpoint retries spanned only ~1s; when the sibling 
lifecycle/banyandb gRPC server was not yet listening (`connect: cannot assign 
requested address`) the role fell back to `ROLE_UNSPECIFIED` permanently, so 
most nodes never reported their real `ROLE_DATA`/`ROLE_LIAISON`. Retry the 
initial node-role resolution with exponen [...]
 
+### Chores
+
+- Regenerate expired TLS test certificate with 100-year validity.
+- Set Ginkgo `--repeat` to 0 in the flaky-test workflow so the hourly run 
completes within the 50-minute timeout.
+
 ## 0.10.2
 
 ### Bug Fixes
 
 - Fix reuse of byte arrays in min/max implementation causing data corruption.
 - Fix index-mode measure queries returning documents outside requested time 
range.
-- Close BanyanDB merge write-path durability gap that allowed torn parts to be 
created by a crash between data write and metadata commit. Metadata files 
(`metadata.json` for trace/measure/stream, `manifest.json` for sidx, plus 
`traceID.filter` and `tag.type`) now go through a new `WriteAtomic` (write-tmp 
+ fsync + rename + fsync-dir) sequence; data writers (`seqWriter.Close`, 
`localFileSystem.Write`) now propagate fdatasync errors instead of silently 
dropping them. `mustOpenFilePart` / ` [...]
 - Fix bydbctl command tests using global stdout capture, which caused 
race-enabled runs to corrupt captured command output.
 - Use `topic` instead of `session_id` as the Prometheus label on liaison 
`queue_sub` chunk-ordering counters to avoid unbounded metric cardinality.
 - Fix flaky trace query filtering caused by non-deterministic sidx tag 
ordering and add consistency checks for integration query cases.
 - MCP: Add validation for properties and harden the mcp server.
 - Fix property schema client connection not stable after data node restarted.
-- Fix flaky on-disk integration tests caused by Ginkgo v2 random container 
shuffling closing gRPC connections prematurely.
-- ui: fix query editor refresh/reset behavior and BydbQL keyword highlighting.
 - Disable the rotation task on warm and cold nodes to prevent incorrect 
segment boundaries during lifecycle migration.
 - Prevent epoch-dated segment directories (seg-19700101) from being created by 
zero timestamps in distributed sync paths.
 - Fix SIDX streaming sync sending SegmentID as MinTimestamp instead of the 
actual timestamp, causing sync failures on the receiving node.
@@ -39,24 +57,10 @@ Release Notes.
 - Fix `FileSystemError` not satisfying `errors.Is(err, io/fs.ErrNotExist)`, 
which prevented the segment controller from cleaning up half-born segment 
directories and left groups in a permanent zombie state after a crash or 
partial sync.
 - Fix lifecycle migration panic when a stream shard's snapshot has no element 
index (`idx/`) directory.
 - Avoid FODC lifecycle inspection failing on busy data nodes by raising the 
per-broadcast `CollectDataInfo` / `CollectLiaisonInfo` deadline from 5s to 30s 
and parallelizing per-group inspection in the cluster-internal `InspectAll`.
-- Fix flaky `file_snapshot` subtest in measure/stream/trace by waiting until 
every introduced mem part has been flushed to disk, instead of only checking 
the latest snapshot creator.
-- Fix flaky `TestCollectWithPartialClosedSegments` by raising 
`SegmentIdleTimeout` so wall-clock variance on slow CI does not mark still-open 
segments as idle.
-- Fix FODC lifecycle cache poisoning where transient `InspectAll` failures 
were cached for 10 minutes and masked liaison recovery; raise FODC agent and 
proxy timeouts from 10s to 40s.
-- Fix FODC `/cluster/lifecycle` dropping zero-valued group fields (e.g. 
`replicas=0`, `close=false`) under `encoding/json` + `omitempty`; switch to 
`protojson` so all fields are emitted (nil nested messages serialize as `null`).
-- Fix trace `block_writer` panic on out-of-order timestamps within the same 
traceID, which dropped one trace-write batch per panic in multi-agent 
SkyWalking deployments. Spans of a single trace originate from 
independently-clocked services, and trace storage is organized by traceID 
rather than timestamp, so per-traceID timestamp monotonicity is not a writer 
invariant.
-- Fix nil-pointer panic on cold-tier data nodes when FODC `InspectAll` raced 
with idle-segment cleanup.
-- Add `GroupLifecycleInfo.errors` to surface per-group collection failures 
from FODC `InspectAll` instead of silently dropping the affected node entry.
-- Fix `CollectDataInfo` and `CollectLiaisonInfo` not handling 
`CATALOG_PROPERTY` groups.
-- Fix lifecycle migration where the receiving node could create segments 
shorter than the configured `SegmentInterval`.
-- Fail fast on incompatible storage version at boot. Previously the server 
would start in a degraded `SERVING` state with affected groups un-loaded 
because the property schema-registry retry loop swallowed the 
version-incompatibility panic. Compatible versions are listed in 
`banyand/internal/storage/versions.yml`.
-- Release bluge index writers on segment rotation so `analysisWorker` pools 
sized from `GOMAXPROCS` don't accumulate across rotations. Two layered defects 
kept the existing idle-segment reclaim path from running: `segmentIdleTimeout` 
defaulted to `0` (which disabled the 10-minute reclaim ticker), and `incRef` 
refreshed `lastAccessed` on every rotation tick so `closeIdleSegments` never 
observed an idle segment. Defaults to `time.Hour`, moves the `lastAccessed` 
bump to real read/write call [...]
-- Fix incorrect counts and missing trace fields in the lifecycle migration 
report.
 
 ### Chores
 
 - Upgrade Go and npm dependencies including etcd to v3.6.10, OpenTelemetry to 
v1.43.0, AWS SDK, and Google Cloud libraries.
-- Regenerate expired TLS test certificate with 100-year validity.
-- Set Ginkgo `--repeat` to 0 in the flaky-test workflow so the hourly run 
completes within the 50-minute timeout.
 
 ## 0.10.0
 
diff --git a/banyand/trace/metrics.go b/banyand/trace/metrics.go
index 0c3594e2f..d52de056d 100644
--- a/banyand/trace/metrics.go
+++ b/banyand/trace/metrics.go
@@ -20,7 +20,6 @@ package trace
 import (
        "github.com/apache/skywalking-banyandb/api/common"
        "github.com/apache/skywalking-banyandb/banyand/internal/storage"
-       "github.com/apache/skywalking-banyandb/banyand/observability"
        "github.com/apache/skywalking-banyandb/pkg/index/inverted"
        "github.com/apache/skywalking-banyandb/pkg/meter"
 )

(skywalking-banyandb) 11/11: Prepare for the v0.10.3 release

Reply via email to