wu-sheng opened a new pull request, #1158: URL: https://github.com/apache/skywalking-banyandb/pull/1158
## Summary Deep-reviewed the storage engines against the **code** (code is the source of truth; docs/diagrams are reference only) and corrected the storage/file-format documentation, then added a new **API-first** storage & file-format reference. Also replaced the file-structure PNG diagrams with inline mermaid. ### Why Several storage descriptions had drifted from the implementation, and entire engines (trace span store, sidx, property storage, measure index-mode) had no diagram at all. The clearest gap: `Measure` exposes one API but `index_mode` silently swaps the entire storage engine — this was under-documented. ### Code-verified corrections - **On-disk hierarchy is `group → segment → shard → part`** (segment is the parent of shard). Fixed the inverted order in `tsdb.md`, `data-model.md`, `clustering.md`, and `disk-management.md` — including the `dump` CLI path examples (`<group>/<segment>/shard-0`). - **Measure field-values file is `fv.bin`**, not `fields.bin`. - **`GORILLA`/`ZSTD` are schema-level enum hints**; the columnar engine encodes with delta / delta-of-delta / dictionary and applies zstd by a size threshold. The Gorilla XOR encoder is dead code. Clarified in `data-model.md` + the new doc. - **Stream** tag columns *are* encoded (the old "no encoding process" wording was misleading); the element inverted index maps terms to **element IDs** (not timestamps). - **Measure `index_mode`** documented as a two-engine split (columnar parts vs inverted-index-only) behind an identical API. - **Trace** span-store layout documented (`spans.bin`, flat `<tag>.t`/`.tm`, `tag.type`, `traceID.filter`) plus the embedded **sidx** ordered index; trace has no per-tag block-skip filter. - **Property** clarified as a Bluge document store (no segment/time dimension), mutable via append + tombstone with a per-segment deleted drop-set; fixed the repair Merkle-tree SHA input (delete **timestamp**, not a boolean) and the snapshot-id change-detection state (`state.json`), and documented the on-disk repair files. ### New doc `docs/concept/storage-and-format.md` — an API-first reference: two storage families; directory hierarchy & part lifecycle; **Measure** (two modes) + **TopN**; **Stream**; **Trace** + **sidx**; **Property**; shared encoding primitives; the distributed chunked-sync wire format; and failed-parts handling. Added to `menu.yml`. ### Diagrams Replaced the file-structure PNGs with inline **mermaid** (renders in the docs site and in IDEs), and refreshed the data-model structure diagram to include **Trace** (the old `structure.png` was missing it). ## Test plan - Docs-only change — no source or generated files are touched (Go build/lint/test and license headers are unaffected; `.md` is excluded from the license check). - Verified all mermaid fences/subgraphs are balanced, cross-document links resolve, and the `group → segment → shard → part` hierarchy is consistent across all docs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
