mrproliu opened a new pull request, #1188:
URL: https://github.com/apache/skywalking-banyandb/pull/1188

   ## Summary
   
   Following 
https://github.com/apache/skywalking-banyandb/actions/runs/27990546888/job/82841967908.
 The scheduled **Flaky Test** workflow on `main` intermittently fails in the
   `banyand/internal/dump/measure` suite:
   
   ```
   --- FAIL: TestMeasureIndexedTagResolvedFromIndex (61.14s)
       suite_test.go:362: Condition never satisfied
       Messages:           series index not fully persisted after stop
   ```
   
   While investigating, the root cause turned out to expose a **real production
   durability gap** in the series index, not just a slow test. This PR fixes 
both:
   
   - **(A)** makes the test resilient to persister scheduling pressure under 
`-race`;
   - **(B)** persists the series index synchronously on graceful shutdown so an
     offline reader (dump / migration / offline CLI) always sees a complete 
index.
   
   ## Root cause
   
   The failing assertion polls the **on-disk** series index (`sidx`) and waits 
for
   both written series to become visible:
   
   ```go
   require.Eventually(t, func() bool {
       count, _ := inverted.ReadOnlyDocCount(sidxPath) // reads the persisted 
snapshot only
       return count >= int64(total)                    // total = 2
   }, 60*time.Second, 100*time.Millisecond, ...)
   ```
   
   `ReadOnlyDocCount` opens a fresh read-only bluge reader, so it only sees
   documents that have already been **persisted to disk** — not the ones still 
held
   in the writer's in-memory segments.
   
   Tracing how the series index reaches disk:
   
   1. The standalone write loop calls `IndexDB().Insert(...)` 
(`write_standalone.go`),
      which goes through `bluge.Writer.Batch`.
   2. The test runs with `--measure-flush-timeout=200ms`. In 
`measure/metadata.go`
      this is converted to whole seconds (`flushTimeout.Nanoseconds() / 
int64(time.Second)`),
      which **truncates to `0`**, so `BatchWaitSec = 0` and bluge is configured
      **without** `WithUnsafeBatches`.
   3. In "safe batch" mode, bluge's `prepareSegment` blocks the `Batch()` call 
until
      the batch is **persisted** (`introduction.persisted` channel). The 
persistence
      itself is performed by the background **persister goroutine**; `Batch()` 
simply
      blocks on it.
   4. Crucially, `bluge.Writer.Close()` does **not** flush pending in-memory
      segments: it signals the persister to stop (`close(closeCh)` → the 
persister
      `break`s without a final flush) and then drops the in-memory root
      (`replaceRoot(nil)`).
   
   Putting the timeline together for the failing run:
   
   - `mustAddDataPoints` (part) runs *before* `Insert` (series index) in the 
same
     goroutine; the part is flushed to disk by a separate flusher, so the 
earlier
     `findPartDirs > 0` check passes.
   - Meanwhile the `Insert` may still be **blocked** on 
`<-introduction.persisted`,
     because the persister has not persisted it yet.
   - The line-362 poll therefore keeps reading `0/1` on disk until it times out.
   
   So the flake is the **bluge persister/introducer goroutine being starved** 
under
   the heavy parallel `-race` CI run, which keeps the synchronous `Insert` (and 
the
   on-disk count) blocked for the full 60s. With `nap=0` this step is normally
   sub-millisecond, which is why it only fails occasionally.
   
   A self-aggravating factor: the `Eventually` opens a brand-new bluge reader 
every
   100ms (600 times), competing for CPU/IO with the very persister goroutine it 
is
   waiting on.
   
   ### The production gap this uncovered
   
   Because `Close()` discards unpersisted in-memory documents, a database that 
is
   **gracefully shut down** while the async persister still has a backlog (in
   production it runs with unsafe batches and a 5s nap) can leave the on-disk 
series
   index **incomplete**. The dump / migration / offline-CLI tools read a 
quiesced,
   on-disk database, so they could miss the most recently written series. This 
is a
   correctness gap independent of the test flake.
   
   ## Changes
   
   ### B — durable series index on graceful shutdown
   
   - `pkg/index/index.go` — add `Flush() error` to the `Store` interface.
   - `pkg/index/inverted/inverted.go` — implement `store.Flush()`: submit an 
empty
     batch carrying a persisted callback and block until it fires. Because the
     introducer always advances the epoch, the callback fires once the root 
snapshot
     (covering all earlier in-memory segments) is durably on disk.
   - `banyand/internal/storage/index.go` — add `seriesIndex.Flush()`.
   - `banyand/internal/storage/segment.go` — in `segmentController.close()` (the
     graceful-shutdown path), `Flush()` each live segment's index **before** 
closing
     it. Segments flagged for deletion are skipped (their directory is removed
     anyway). This runs **only on shutdown** — not on the idle-reclaim or delete
     paths, which would otherwise force a synchronous persist on a hot path.
   
   ### A — make the test resilient
   
   - `banyand/internal/dump/measure/suite_test.go` — poll at a coarse `1s` 
interval
     (was `100ms`) and raise the timeout to `90s` (was `60s`). The coarse 
interval
     stops the read-only-reader loop from starving the persister it is waiting 
on.
   
   > ⚠️ Implementation note: `Flush()` was first placed in 
`seriesIndex.Close()`,
   > which is also called from idle-reclaim and pre-delete paths. Forcing a
   > synchronous persist on every segment close blew the `storage` suite up to a
   > 400s timeout. Narrowing it to the shutdown path only brought it back to 
~35s.
   
   ## Testing
   
   All run locally with `-race`:
   
   - `go build ./...`, `go vet` — pass
   - `golangci-lint` (project-pinned v1.64.8) — clean
   - `banyand/internal/storage` — pass, **35.1s** (was a 400s timeout with the
     overly-broad flush placement)
   - `banyand/internal/dump/measure` — pass, 9.9s
   - `pkg/index/inverted` — pass
   - `TestMeasureIndexedTagResolvedFromIndex` ×3 — pass
   
   
   
   - [ ] If this pull request closes/resolves/fixes an existing issue, replace 
the issue number. Fixes apache/skywalking#<issue number>.
   - [ ] Update the [`CHANGES` 
log](https://github.com/apache/skywalking-banyandb/blob/main/CHANGES.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to