mrproliu opened a new pull request, #1180:
URL: https://github.com/apache/skywalking-banyandb/pull/1180

   ## Summary
   
   During lifecycle tier migration (hot → warm → cold), the migration 
re-resolves every row's measure/stream schema from the registry to rebuild its 
write request. If that schema was **deleted from the registry** — for example 
when an upstream OAP metric is renamed or removed — the row can no longer be 
written to any target, and the migration previously **aborted the whole 
group**: the healthy data in that group never moved forward either.
   
   This PR makes those **orphan** rows non-fatal. The migration now detects a 
row whose schema is gone, skips it so the rest of the group still migrates, and 
— by default — **archives** the row to a self-describing file so an operator 
can still recover it. The orphan policy is configurable: `archive` (default) 
keeps the data, `discard` drops it.
   
   ## Motivation
   
   A real production migration failed with:
   
   ```
   measure schema sw_metricsHour/meter_banyandb_instance_disk_usage_all_hour 
not found in group snapshot
   ```
   
   The metric had been renamed upstream and its schema removed from the 
registry, but its on-disk data still lived in a source segment. Rather than 
moving the rest of the (healthy) group forward, the migration aborted the 
entire group. Orphan handling turns this localized, expected condition into a 
recoverable, observable event instead of a hard stop.
   
   ## What changed
   
   ### Detection
   
   - A row whose schema is absent from the registry is classified as an orphan 
(`errOrphanSchema`):
     - **Measure**: the replayer pre-fetches all measure schemas once at 
construction (`ListMeasure`); a name missing from that consistent snapshot is 
an orphan.
     - **Stream**: `loadSchema` resolves per subject via `GetStream`; only a 
genuine `schema.ErrGRPCResourceNotFound` is treated as orphan. **Any other 
error (network, closed registry, context cancellation) stays fatal**, so a 
transient registry failure is never mistaken for a droppable orphan.
   - Orphan skips and series-index-gap skips (`errSkipSeries`) are distinct, 
mutually-exclusive sentinels routed via a `skipError.kind`, and are tallied in 
separate counters.
   
   ### Policy: `--migration-orphan-policy` (`archive` | `discard`, default 
`archive`)
   
   - **`archive`**: each orphan row is written as one JSON line, 
**self-describing** — decoded from the part's own column types with no registry 
schema needed. It carries the group, catalog, measure/stream name, source 
location (stage/segment/shard/part), series id, entity, timestamp (RFC3339 + 
epoch-nanos), tags, and — for measures — `version`, `indexed_tags`, and 
`fields`; streams carry `element_id` instead and omit `version`. Per-part JSONL 
is **gzip-compressed** on disk (highly repetitive rows compress ~37×). A 
per-segment `manifest.json` indexes which deleted subjects were archived and 
their row counts; it is written atomically (write-tmp + rename) so a crash 
never leaves a truncated manifest. Re-replaying a part rewrites its file 
idempotently (no double-counting on resume).
   - **`discard`**: orphan rows are dropped and surfaced only in the report; 
nothing is written to disk.
   
   ### Archive location: `--migration-orphan-archive-subdir` (relative, default 
`archive`)
   
   The archive lives in a **relative subdirectory under each catalog's own root 
path**, not a separate absolute directory:
   
   ```
   
<catalog-root-path>/<subdir>/<group>/seg-<segment-suffix>/shard-<id>/part-<part-id>.jsonl.gz
   <catalog-root-path>/<subdir>/<group>/seg-<segment-suffix>/manifest.json
   ```
   
   So measure orphans land under `<measure-root-path>/archive/...` and stream 
orphans under `<stream-root-path>/archive/...`. The catalog is **not** a path 
level (it is recorded inside every record and manifest instead). This means the 
archive shares the durability of the volume that already holds the catalog data 
— no separate path to provision, and no ephemeral `/tmp` default. The flag must 
be relative (validated at startup).
   
   ### Source-segment retention decoupling
   
   The migration deletes source segments after a successful run. This PR 
separates two skip reasons:
   
   - **series-index gap** (`errSkipSeries`): a series could not be 
resolved/rebuilt, so its rows remain **only** in the source — the source 
segment is **retained** (excluded from the post-migration delete set) to avoid 
permanent data loss.
   - **orphan** (`errOrphanSchema`): the rows are archived (or discarded) and 
the schema is gone, so the source segment is **deleted normally**.
   
   `excludeRetainedSuffixes` removes the retained segment suffixes from the 
delete candidate list at the end of migration.
   
   ### Reporting
   
   Orphan handling is **expected behavior, not a migration error**. Instead of 
polluting the report's `errors` buckets (and inflating part-error counts), the 
migration report now carries a dedicated `orphans` section:
   
   ```json
   "orphans": {
     "policy": "archive",
     "measure": { "sw_metricsHour": { "meter_..._hour": 1234 } },
     "stream":  { "<group>": { "<stream>": 56 } }
   }
   ```
   
   Counts are tracked per deleted subject, persisted in progress, and 
accumulate across resume cycles.
   
   ### Safety
   
   - A failed archive write under the `archive` policy is **fatal**: the part 
aborts, the source segment is retained, and resume retries the whole part. An 
orphan row is never silently dropped due to an archive I/O error.
   - **Trace is not covered**: a trace group has a single schema, so a deleted 
trace schema is a whole-group concern rather than a per-series orphan within a 
surviving group.
   
   ## Configuration
   
   | Flag | Description | Default |
   | --- | --- | --- |
   | `--migration-orphan-policy` | What to do with rows whose schema was 
deleted from the registry: `archive` or `discard` | `archive` |
   | `--migration-orphan-archive-subdir` | Relative subdirectory, under each 
catalog's root path, where orphan rows are archived when policy is `archive` | 
`archive` |
   
   ## Reading the archive
   
   The `manifest.json` files are plain text. The per-part data is 
gzip-compressed JSON Lines:
   
   ```bash
   # inspect one part's rows (Linux: zcat; macOS: gzcat)
   gunzip -c 
<measure-root-path>/archive/<group>/seg-20260601/shard-0/part-000000000000003b.jsonl.gz
 | jq .
   
   # what was archived for a segment (counts per deleted subject)
   jq '{total_rows, total_series, measures:[.measures[]|{measure,rows}]}' 
.../seg-20260601/manifest.json
   ```
   
   ## Testing
   
   **Unit tests** (`banyand/backup/lifecycle`):
   
   - Archive write + gzip round-trip + manifest tallies (rows, distinct-series 
union across parts).
   - Idempotent resume (re-replay rewrites the part file and does not 
double-count).
   - `discard` policy writes nothing to disk.
   - Archive write/open failure is fatal and retains the source segment.
   - Orphan vs sidx-gap counter separation; per-subject orphan counts.
   - **Transient registry error is NOT classified as orphan** (stream) — guards 
the data-loss boundary.
   - Report `orphans` section shape and `Progress.AddOrphanRows` aggregation.
   - Path layout (`<root>/<group>/seg-.../...`, no catalog level).
   
   **Distributed e2e** (`test/cases/lifecycle/orphan.go`, measure + stream):
   
   Runs the real lifecycle command with the archive policy on the distributed 
lifecycle cluster. Each spec creates a group with two resources (one to delete, 
one to keep), writes rows that straddle the segment boundary, deletes one 
schema, runs the migration, then asserts:
   
   1. every archived record is correct (measure carries `fields`; stream 
carries `element_id` and no fields), and the manifest row count matches;
   2. the orphan source segment is deleted;
   3. the kept resource migrated and is queryable on the warm stage.
   
   A pre-migration registry-settle gate (poll until the kept resource is 
resolvable and the deleted one is gone) avoids a flaky race where a 
not-yet-propagated kept schema could be misread as orphan.
   
   ## Limitations and operations notes
   
   - The archive directory is **not** cleaned up automatically; prune it once 
the data is no longer needed.
   
   - [ ] If this pull request closes/resolves/fixes an existing issue, replace 
the issue number. Fixes apache/skywalking#<issue number>.
   - [ ] Update the [`CHANGES` 
log](https://github.com/apache/skywalking-banyandb/blob/main/CHANGES.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to