wu-sheng opened a new pull request, #1158:
URL: https://github.com/apache/skywalking-banyandb/pull/1158

   ## Summary
   
   Deep-reviewed the storage engines against the **code** (code is the source 
of truth; docs/diagrams are reference only) and corrected the 
storage/file-format documentation, then added a new **API-first** storage & 
file-format reference. Also replaced the file-structure PNG diagrams with 
inline mermaid.
   
   ### Why
   
   Several storage descriptions had drifted from the implementation, and entire 
engines (trace span store, sidx, property storage, measure index-mode) had no 
diagram at all. The clearest gap: `Measure` exposes one API but `index_mode` 
silently swaps the entire storage engine — this was under-documented.
   
   ### Code-verified corrections
   
   - **On-disk hierarchy is `group → segment → shard → part`** (segment is the 
parent of shard). Fixed the inverted order in `tsdb.md`, `data-model.md`, 
`clustering.md`, and `disk-management.md` — including the `dump` CLI path 
examples (`<group>/<segment>/shard-0`).
   - **Measure field-values file is `fv.bin`**, not `fields.bin`.
   - **`GORILLA`/`ZSTD` are schema-level enum hints**; the columnar engine 
encodes with delta / delta-of-delta / dictionary and applies zstd by a size 
threshold. The Gorilla XOR encoder is dead code. Clarified in `data-model.md` + 
the new doc.
   - **Stream** tag columns *are* encoded (the old "no encoding process" 
wording was misleading); the element inverted index maps terms to **element 
IDs** (not timestamps).
   - **Measure `index_mode`** documented as a two-engine split (columnar parts 
vs inverted-index-only) behind an identical API.
   - **Trace** span-store layout documented (`spans.bin`, flat `<tag>.t`/`.tm`, 
`tag.type`, `traceID.filter`) plus the embedded **sidx** ordered index; trace 
has no per-tag block-skip filter.
   - **Property** clarified as a Bluge document store (no segment/time 
dimension), mutable via append + tombstone with a per-segment deleted drop-set; 
fixed the repair Merkle-tree SHA input (delete **timestamp**, not a boolean) 
and the snapshot-id change-detection state (`state.json`), and documented the 
on-disk repair files.
   
   ### New doc
   
   `docs/concept/storage-and-format.md` — an API-first reference: two storage 
families; directory hierarchy & part lifecycle; **Measure** (two modes) + 
**TopN**; **Stream**; **Trace** + **sidx**; **Property**; shared encoding 
primitives; the distributed chunked-sync wire format; and failed-parts 
handling. Added to `menu.yml`.
   
   ### Diagrams
   
   Replaced the file-structure PNGs with inline **mermaid** (renders in the 
docs site and in IDEs), and refreshed the data-model structure diagram to 
include **Trace** (the old `structure.png` was missing it).
   
   ## Test plan
   
   - Docs-only change — no source or generated files are touched (Go 
build/lint/test and license headers are unaffected; `.md` is excluded from the 
license check).
   - Verified all mermaid fences/subgraphs are balanced, cross-document links 
resolve, and the `group → segment → shard → part` hierarchy is consistent 
across all docs.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to