hanahmily opened a new pull request, #1118:
URL: https://github.com/apache/skywalking-banyandb/pull/1118
## Summary
Closes the BanyanDB merge write-path durability gap that allowed torn parts
to be created on disk and survive across process restarts
(apache/skywalking#13862, root cause for #13861). The merger's response to a
torn part — `logger.Panicf` — is preserved unchanged; only the conditions under
which a torn part can exist on disk are tightened.
Three layered, independently-correct fixes plus a reproducer test:
1. **`pkg/fs` durability primitives** — `seqWriter.Close()` now propagates
the `fdatasync` error (was discarded); `localFileSystem.Write()` now `fsync`s
before close; new `WriteAtomic` helper does write-tmp → file.Sync → rename →
dir.Sync, with `MustFlushAtomic` as the panic-on-error wrapper and
`CleanupLeftoverTmp` as the on-open helper.
2. **Two-phase commit for every metadata-style write** in trace, measure,
stream, sidx — `metadata.json` / `manifest.json` / `traceID.filter` /
`tag.type` / `seriesMetadata.bin` / snapshot files all go through `WriteAtomic`
(or `MustFlushAtomic` for sidx). Streaming-sync sidecars in
`trace/write_data.go` also switched.
3. **Per-engine startup cleanup** — each engine's `mustOpenFilePart` /
`mustOpenPart` calls `fs.CleanupLeftoverTmp` to remove safe post-rename `.tmp`
leftovers; sidx `loadPartMetadata` no longer silently falls back to legacy
`meta.bin` when a stranded `manifest.json.tmp` exists (would have masked the
incomplete-commit state).
4. **Reproducer test** — `Test_merger_tornSpansBin_stillPanics` forges the
production-shape on-disk state and asserts the merger still panics with one of
the canonical fail-fast messages, guarding the contract against future
refactors.
## Why the fix shape
Earlier draft converted the six \`offset must be equal to bytesRead\` panics
to errors and added a `recover()` boundary in `mergeParts`. Per maintainer
design feedback, the panic-on-corruption pattern is BanyanDB's deliberate
fail-fast contract and was preserved. The fix is purely preventive: it prevents
torn parts from being **created** by a crash mid-merge or by a silent
`fdatasync` error. Torn parts that arrive on disk via external causes (rsync,
host disk failure, pre-fix-deployment leftovers) still trigger the canonical
panic.
Two reviewer passes (architect critic + GitHub Copilot CLI) caught
additional issues that were addressed inline:
- `WriteAtomic` no longer removes the `.tmp` on rename failure (forensic
value).
- 11 redundant `SyncPath` calls removed (each `WriteAtomic` already does
`dir.Sync`); trace was paying up to 4 dir fsyncs per finalize, now 1.
- `seriesMetadata.bin` and snapshot files were missed in the original audit;
both now atomic.
## Test plan
- [x] \`make license-check\` / \`check-req\` / \`build\` / \`lint\` /
\`check\` — green
- [x] Unit tests: \`./banyand/...\`, \`./bydbctl/...\`, \`./pkg/...\`,
\`./fodc/...\` — green
- [x] Integration tests: \`./test/integration/standalone/...\`,
\`./test/integration/distributed/...\` — green
- [x] Targeted package suites: pkg/fs (21/21 specs, +5 new), trace 22.1s,
measure 29.3s, stream 17.4s, sidx 3.4s
- [x] New per-engine \`merger_durability_test.go\` files (4 tests × 4
engines, 11 cases total) verify atomic metadata commit + leftover-tmp cleanup
- [x] \`Test_merger_tornSpansBin_stillPanics\` (trace) actively guards the
canonical panic-on-torn-part contract; captured panic shape: \`cannot read
data: unexpected EOF\`
- [ ] e2e (FODC, OAP) — skipped locally; require Kubernetes/Kind. Will run
via CI.
## Out of scope
- No conversion of `logger.Panicf("offset N must be equal to bytesRead M",
...)` sites to error returns.
- No `recover()` boundary in `mergeParts`. Process exit on torn data is
preserved.
- No quarantine of torn parts to a `corrupt/` subdirectory. Operator handles
via existing snapshot-based orphan cleanup at startup.
- No new metric. Pod restart count from kubelet remains the operator signal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]