This is an automated email from the ASF dual-hosted git repository. hanahmily pushed a commit to branch phase-2-cp5-march in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
commit 53933c0b6a1f8828fe8e09ced4c5c2555ebe701e Author: Hongtao Gao <[email protected]> AuthorDate: Thu May 7 23:59:47 2026 +0000 docs(changes): record data-node NodeSchemaStatusService exposure in CHANGES.md Refines the §6.12 spec authoring entry now that the liaison-pause path is the only working approach (the deferred reason is the global notifiedModRevision watermark race, not data-node service exposure as previously noted), and adds two new bullets covering the queue.Server.SetNodeSchemaStatusRepo decoupling and the GetMaxRevision aggregation repair. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <[email protected]> Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- CHANGES.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 4141adca1..db242ee10 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -45,8 +45,10 @@ Release Notes. - Add observability for the schema-consistency cluster (Step 2.7, §A17): `schema_await_revision_applied_duration_seconds{result}`, `schema_await_schema_applied_duration_seconds{result}`, `schema_await_schema_deleted_duration_seconds{result}` track barrier latency by outcome (`applied` / `timeout` / `invalid_argument` / `error`); `schema_barrier_laggard_nodes_total{barrier,role,node}` decodes the `<role>-<Metadata.Name>` laggard identifier so dashboards can break out which member fell b [...] - Add the schema-barrier CP-6 SLO load harness (Step 2.8) under `test/load/schema_barrier/`, runnable via `make load-test-barrier`. The harness brings up an in-process 3 data node + 1 liaison cluster, drives 100 concurrent `AwaitRevisionApplied` callers + 10 `Group.Update` ops/sec, and reports p50 / p95 / p99 / max from client-side per-call duration after a 1-minute warm-up + 5-minute measurement window. Client-side latency is bounded above by the server-side histogram so the SLO check [...] - Land `pkg/test/setup.PauseDataNodeWatch` / `ResumeDataNodeWatch` (Step 1.0 follow-up): the helpers replace the `ErrWatchControlNotImplemented` stub with a working hook into `property.SchemaRegistry` so cluster-only specs can drive a single data node to fall behind the cluster while the rest stays in sync. The data node's `handleWatchEvent`, `processInitialResourceFromProperty`, and `handleDeletion` paths each gate events into a per-registry queue while paused; resume drains the queue [...] - - Extend the watch-control binding to liaison processes (`pkg/test/setup.startLiaisonNode`) and add `helpers.SharedContext.LiaisonAddr` so cluster-only specs can pause the receiving liaison's own `SchemaRegistry`. The cluster barrier's `selfName` probe reads through that SR, so pausing it surfaces a laggard via the public `AwaitX` RPCs without needing `NodeSchemaStatusService` exposed on data-node ports — the in-process distributed harness does not currently host that service on data n [...] - - Author §6.12 cluster-barrier integration specs (`test/cases/schema/barrier_cluster.go`): §6.12b (`AwaitSchemaApplied`) and §6.12c (`AwaitSchemaDeleted`) pin the public-API contract that a paused receiving liaison surfaces a non-empty `laggards` list and that resume drains the queue so the per-key barrier converges. §6.12a (`AwaitRevisionApplied`) and §6.12d (cross-barrier recovery) are checked in as `PIt` (pending) — the queue replay runs and the per-key barriers converge, but the gl [...] + - Extend the watch-control binding to liaison processes (`pkg/test/setup.startLiaisonNode`) and add `helpers.SharedContext.LiaisonAddr` so cluster-only specs can pause the receiving liaison's own `SchemaRegistry`. The cluster barrier's `selfName` probe reads through that SR, so pausing it surfaces a laggard via the public `AwaitX` RPCs. + - Author §6.12 cluster-barrier integration specs (`test/cases/schema/barrier_cluster.go`): §6.12b (`AwaitSchemaApplied`) and §6.12c (`AwaitSchemaDeleted`) pin the public-API contract that a paused receiving liaison surfaces a non-empty `laggards` list and that resume drains the queue so the per-key barrier converges. §6.12a (`AwaitRevisionApplied`) and §6.12d (cross-barrier recovery) are checked in as `PIt` (pending): the laggard-detection assertion passes but the post-resume `AwaitRev [...] + - Expose `cluster.v1.NodeSchemaStatusService` on data-node gRPC ports. Decouple the registration in `banyand/queue/sub/server.go`'s `Serve()` so `fodc.v1.GroupLifecycleService` (liaison-only by design) and `NodeSchemaStatusService` (per-node by design) are gated independently: the new `queue.Server.SetNodeSchemaStatusRepo(metadata.Service)` setter wires the per-node service without dragging along the liaison-shaped `GroupLifecycleService`. Liaison startup (`pkg/cmdsetup/liaison.go`) ca [...] + - Repair the `GetMaxRevision` aggregation on the per-node `NodeSchemaStatusService` (`banyand/metadata/schema/property/node_status.go`). The previous implementation returned `min(schemaCache.notifiedModRevision, NodeRepoRegistry.LatestModRevision)`, but `LatestModRevision` aggregated per-service `schemaRepo` watermarks via `min` — and each `schemaRepo` only advances on events for its own catalog (`pkg/schema/init.go:72` filters by `g.Catalog`), so the min was perpetually pinned to the [...] ### Bug Fixes
