hanahmily opened a new issue, #13863:
URL: https://github.com/apache/skywalking/issues/13863

   Surfaced during triage of #13861. Filing separately because it's an 
independent footgun that doesn't depend on (and isn't fixed by) #13862.
   
   ## Background
   
   In the #13861 thread I claimed BanyanDB 0.10.1 should refuse to load 0.9.0 
data files because the segment file version 1.3.0 (used in 0.9) is not in 
`banyand/internal/storage/versions.yml` (which on 0.10.1 is `[1.4.0]`, on 
`main` is `[1.4.0, 1.5.0]`). The reporter's controlled fresh-disk experiment 
subsequently proved that #13861's panic is unrelated to legacy file format — it 
reproduces on disks that 0.9 never touched. So the version-check question is 
*separate* from #13861's root cause, but the original observation still stands: 
in the reporter's environment, an in-place 0.9.0 → 0.10.1 upgrade *did* start 
successfully. That should not be possible if the version check is functioning.
   
   Reading the code, I think I see why — and it's worse than "the check didn't 
fire": **the segment loader silently `MustRMAll`s any segment whose metadata 
file is missing or empty.**
   
   ## Code path
   
   `banyand/internal/storage/segment.go::segmentController.open()` (lines 
516-564):
   
   ```go
   err := loadSegments(sc.location, segPathPrefix, sc, 
sc.getOptions().SegmentInterval, func(start, end time.Time) error {
       ...
       metadataPath := path.Join(segmentPath, metadataFilename)            // 
<segPath>/metadata
       rawMeta, readErr := sc.lfs.Read(metadataPath)
       if readErr != nil {
           if errors.Is(readErr, fs.ErrNotExist) {
               invalidSegments = append(invalidSegments, segmentPath)        // 
(A) missing → "invalid"
               return nil
           }
           return readErr
       }
       if len(rawMeta) == 0 {
           invalidSegments = append(invalidSegments, segmentPath)            // 
(B) empty → "invalid"
           return nil
       }
       meta, parseErr := readSegmentMeta(rawMeta)
       if parseErr != nil {
           return parseErr                                                   // 
(C) unparseable / wrong version → propagate
       }
       ...
   })
   if len(invalidSegments) > 0 {
       sc.l.Warn().Strs("segments", invalidSegments).Msg("invalid segments 
found (empty or epoch-dated), removing them")
       for i := range invalidSegments {
           sc.lfs.MustRMAll(invalidSegments[i])                              // 
← silent data loss
       }
   }
   ```
   
   So:
   
   - **(A) Metadata file missing** → segment dir silently deleted 
(`MustRMAll`), only a `Warn` log.
   - **(B) Metadata file empty** → silently deleted, only a `Warn` log.
   - **(C) Metadata file present, version 1.3.0 (or any other unsupported)** → 
`errVersionIncompatible` returned and propagated up; bootstrap fails. ✓ correct.
   
   The reporter's 0.9.0 → 0.10.1 upgrade hit (A) or (B) rather than (C), 
depending on whether 0.9.0 wrote a `metadata` file at all and what shape it 
had. That's why the upgrade *appeared* to succeed — segments from 0.9 were 
silently destroyed and a new tree was built on top.
   
   ## Why this is a bug independent of version-check
   
   The "delete on invalid metadata" branch is wrong for many reasons that have 
nothing to do with version compatibility:
   
   1. **Silent data loss.** Any operational hiccup that produces an unreadable 
`metadata` file (permission error temporarily masked by another caller, partial 
write from a previous crash before the metadata fsync, manual operator 
inspection that briefly moves the file aside) destroys the entire segment with 
only a `Warn` log. The user has no signal that data is being deleted.
   2. **Version-mismatch gate is bypassed.** A core function of the metadata 
file is to gate compatibility — but the missing/empty branches make that gate 
disappear. An operator running an unsupported upgrade gets a clean-looking 
"fresh database" instead of a "your data is incompatible, here's how to 
migrate" error.
   3. **Crash-loop amplification.** If a writer crashes between segment dir 
creation and `metadata` file write (a window that exists in 
`segmentController.create`, lines 599-621: `MkdirPanicIfExist` → 
`CreateLockFile` → `Write`, with no fsync sequence), the next startup deletes 
that segment.
   4. **`MustRMAll` panics on permission/IO errors**, so an operator who 
debug-protected `/data` (e.g., `chmod -R a-w` to inspect) gets a hard panic 
during bootstrap on top of the silent-deletion path.
   
   ## Proposed fix
   
   Two changes:
   
   1. **Stop deleting "invalid" segments.** Replace the `MustRMAll` with one of:
      - **Refuse to start** with a clear error: `"segment <path> has 
missing/unreadable/incompatible metadata; manual intervention required"`. This 
is the safest default for a database — never auto-delete data the operator 
didn't ask to delete.
      - Or **quarantine** the segment dir by renaming it to 
`<segPath>.corrupt-<timestamp>` and emit a metric 
(`banyandb_corrupt_segment_total{reason=...}`) + WARN log. Operator can 
review/delete manually.
   
   2. **Treat missing/empty `metadata` as a real error, not as a license to 
delete.** Empty/missing metadata almost always indicates incomplete write or an 
unsupported predecessor version — both cases the operator needs to be told 
about explicitly.
   
   For the original 0.9 → 0.10 upgrade scenario specifically: if 0.9 wrote no 
`metadata` file at all, the new behavior surfaces a clean "found segment dir 
without metadata file, refusing to load" error. The operator can then check the 
upgrade docs (or file an issue, like #13861 turned out to be).
   
   Optionally, a third fix:
   
   3. **Tighten `segmentController.create`'s metadata write durability.** 
`CreateLockFile` → `Write` → return without explicit fsync of either the file 
or the directory means a crash between segment-dir creation and metadata commit 
leaves the (A) state. After fix (1) this is no longer catastrophic 
(refuse-to-start instead of silent delete), but the fsync gap is the same one 
#13862 audits for the merge path; the fix shape (`fsync` data → atomic rename → 
`fsync` dir) applies here too.
   
   ## Scope
   
   `banyand/internal/storage/segment.go::open` is shared by all storage engines 
(`trace`, `measure`, `stream`, `sidx`), so one fix covers all of them.
   
   ## Caveat
   
   I haven't yet checked what 0.9.0 actually wrote in segment dirs (whether a 
`metadata` file existed, whether it had the version inside, whether the format 
was the same key/JSON). That would let us confirm exactly which branch (A or B) 
fired in the reporter's upgrade. Code-only audit is enough for this issue 
though — even if 0.9 wrote a perfect 1.3.0 metadata file and the reporter's 
upgrade hit branch (C), the silent-delete behavior on (A) and (B) is a footgun 
in its own right.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to