This is an automated email from the ASF dual-hosted git repository. vhs pushed a commit to branch rfc-blob-cleaner in repository https://gitbox.apache.org/repos/asf/hudi.git
commit 2b0aa201928f04d37e8ab20aa8d464a9b1db10b7 Author: voon <[email protected]> AuthorDate: Fri Mar 20 01:33:06 2026 +0800 add log merged state orphaned blob edge case --- rfc/rfc-100/rfc-100-blob-cleaner-design.md | 246 +++++++++++++++++++++-------- rfc/rfc-100/rfc-100-blob-cleaner.md | 115 ++++++++++---- 2 files changed, 266 insertions(+), 95 deletions(-) diff --git a/rfc/rfc-100/rfc-100-blob-cleaner-design.md b/rfc/rfc-100/rfc-100-blob-cleaner-design.md index 76b78bcc0af8..ddb895859ba4 100644 --- a/rfc/rfc-100/rfc-100-blob-cleaner-design.md +++ b/rfc/rfc-100/rfc-100-blob-cleaner-design.md @@ -154,16 +154,47 @@ MDT secondary index. The dispatch mechanism is a zero-cost string prefix check o | MOR strategy | Over-retain (union of base + log refs) | Safe (C5, R4); cleaned after compaction | | Container strategy | Tuple-level tracking; delete only when all ranges dead | Correct (C4, R3); partial containers flagged for blob compaction | -> **DIAGRAM 1: Architecture Overview** -> -> *Block diagram showing how blob cleanup fits into the existing cleaner pipeline.* -> -> Shows: `CleanPlanActionExecutor.requestClean()` → `CleanPlanner` (per-partition, per-FG iteration) -> → **Stage 1** (per-FG blob ref collection + set difference + dispatch) → **Stage 2** (MDT -> secondary index lookup for external candidates) → **Stage 3** (container lifecycle resolution) → -> `HoodieCleanerPlan` (with `blobFilesToDelete` + `containersToCompact`) → `CleanActionExecutor` -> (parallel blob file deletion). The existing file slice deletion path runs alongside the blob -> deletion path within the same clean action. +```mermaid +flowchart LR + subgraph Planning["CleanPlanActionExecutor.requestClean()"] + direction TB + Gate{"hasBlobColumns()?"} + Gate -- No --> Skip["Skip blob cleanup<br/>(zero cost)"] + Gate -- Yes --> CP + + subgraph CP["CleanPlanner (per-partition, per-FG)"] + direction TB + Policy["Policy method<br/>→ FileGroupCleanResult<br/>(expired + retained slices)"] + S1["<b>Stage 1</b><br/>Per-FG blob ref<br/>set difference + dispatch"] + Policy --> S1 + end + + S1 --> S2["<b>Stage 2</b><br/>Cross-FG verification<br/>(MDT secondary index)"] + S1 -->|hudi_blob_deletes| S3 + S2 -->|external_deletes| S3["<b>Stage 3</b><br/>Container lifecycle<br/>resolution"] + end + + subgraph Plan["HoodieCleanerPlan"] + FP["filePathsToBeDeleted<br/>(existing)"] + BP["blobFilesToDelete<br/>(new)"] + CC["containersToCompact<br/>(new)"] + end + + S3 --> BP + S3 --> CC + CP --> FP + + subgraph Execution["CleanActionExecutor.runClean()"] + direction TB + DF["Delete file slices<br/>(existing, parallel)"] + DB["Delete blob files<br/>(new, parallel)"] + RC["Record containers<br/>for blob compaction"] + end + + FP --> DF + BP --> DB + CC --> RC +``` --- @@ -182,20 +213,28 @@ Output: hudi_blob_deletes -- blobs safe to delete immediately for each file_group being cleaned: - // Collect expired blob refs + // Collect expired blob refs (base files + log files) + // Must read log files: blob refs introduced and superseded within the log + // chain before compaction would otherwise become permanent orphans. expired_refs = Set<(path, offset, length)>() for slice in expired_slices: - for ref in extractBlobRefs(slice): // base: columnar projection; log: field extraction + for ref in extractBlobRefs(slice.baseFile): // columnar projection + if ref.type == OUT_OF_LINE and ref.managed == true: + expired_refs.add((ref.path, ref.offset, ref.length)) + for ref in extractBlobRefs(slice.logFiles): // full record read if ref.type == OUT_OF_LINE and ref.managed == true: expired_refs.add((ref.path, ref.offset, ref.length)) if expired_refs is empty: - continue // no blob work for this FG + continue // no blob work for this FG - // Collect retained blob refs + // Collect retained blob refs (base files only) + // Cleaning is fenced on compaction: retained base files contain the merged + // state. Log reads are unnecessary -- any shadowed base ref causes safe + // over-retention, cleaned after the next compaction cycle. retained_refs = Set<(path, offset, length)>() - for slice in retained_slices: // includes base + log files (MOR) - for ref in extractBlobRefs(slice): + for slice in retained_slices: + for ref in extractBlobRefs(slice.baseFile): // columnar projection only if ref.type == OUT_OF_LINE and ref.managed == true: retained_refs.add((ref.path, ref.offset, ref.length)) @@ -214,10 +253,16 @@ for each file_group being cleaned: within the file group that created it. If a blob ref appears in an expired slice but not in any retained slice of the same FG, it is globally orphaned. No cross-FG check is needed. -**Why correct for MOR (C5, R4).** Retained blob refs are collected as the union of base file refs -and log file refs. This over-counts: a log update that changes a record's blob ref makes the base -file's old ref appear live. This is safe -- over-retention prevents premature deletion. After -compaction merges the log into a new base file, the orphan is identified in the next clean cycle. +**Why correct for MOR (C5, R4).** Two asymmetric read strategies: + +- **Expired slices: base + log files.** Log files must be read because blob refs can be introduced + and superseded entirely within the log chain before compaction (e.g., `log@t2: row1→blob_B`, + `log@t3: row1→blob_C`). After compaction, `blob_B` exists only in the expired log. Skipping it + would create a permanent orphan (R2 violation). +- **Retained slices: base files only.** Since cleaning is fenced on compaction, retained base files + contain the merged state. Any blob ref shadowed by an uncompacted log on top of the retained + slice appears in the retained set via the base file -- this causes over-retention (safe, never + premature deletion). The shadowed ref is cleaned after the next compaction cycle. **Why correct for savepoints (C9).** The existing cleaner excludes savepointed file slices from the expired set. Blob cleanup inherits this: savepointed slices are always in the retained set. @@ -227,15 +272,18 @@ and `expired_slices` is all slices. For Hudi-created blobs, all are safe to dele creates new blobs in the target FG via F8). For external blobs, all flow to Stage 2 for cross-FG verification (clustering copies the pointer via F9, so Stage 2 finds the reference in the target FG). -> **DIAGRAM 2: Stage 1 Flow** -> -> *Flowchart showing the per-file-group blob cleanup logic.* -> -> Shows: File Group → Extract expired blob refs → Extract retained blob refs → Set difference -> (expired - retained) → local_orphans → Path prefix check → Hudi-created? → YES: add to -> `hudi_blob_deletes` (safe to delete) / NO: add to `external_candidates` (needs Stage 2). -> Annotate the Hudi-created branch with "P3: no cross-FG refs" and the external branch with -> "C13: cross-FG sharing possible". +```mermaid +flowchart TD + FG["File Group being cleaned"] + FG --> Exp["Extract blob refs from<br/><b>expired</b> slices<br/>(base files + log files)"] + Exp --> Empty{"expired_refs<br/>empty?"} + Empty -- Yes --> Done["Skip FG<br/>(no blob work)"] + Empty -- No --> Ret["Extract blob refs from<br/><b>retained</b> slices<br/>(base files only —<br/>fenced on compaction)"] + Ret --> Diff["Set difference:<br/><code>local_orphans = expired - retained</code>"] + Diff --> Check{"Path starts with<br/><code>.hoodie/blobs/</code>?"} + Check -- "Yes (Hudi-created)" --> Hudi["Add to <b>hudi_blob_deletes</b><br/>✓ Safe to delete immediately<br/><i>P3: no cross-FG refs possible</i>"] + Check -- "No (External)" --> Ext["Add to <b>external_candidates</b><br/>→ Needs Stage 2 verification<br/><i>C13: cross-FG sharing possible</i>"] +``` ### Stage 2: Cross-FG Verification (External Blobs) @@ -314,15 +362,32 @@ a bottleneck on large tables. The operator is warned to enable the MDT secondary | No index, few candidates | Table scan | O(candidates * table) | Small tables, few shared blobs | | No index, many candidates | Circuit break | Zero (deferred) | Large tables -- index required | -> **DIAGRAM 3: Stage 2 Flow (MDT Secondary Index Path)** -> -> *Sequence diagram showing the two-hop lookup.* -> -> Shows: Cleaner → MDT Secondary Index: batched prefix scan with candidate paths → returns -> `Map<path, List<recordKey>>` → for each path: Cleaner → MDT Record Index: lookup record key → -> returns `(partition, fileId)` → check: fileId in cleaned_fg_ids? → YES: try next record key / -> NO: **short-circuit** → blob is live, retain. If all record keys resolve to cleaned FGs → blob -> is globally orphaned → add to external_deletes. +```mermaid +sequenceDiagram + participant C as Cleaner (Stage 2) + participant SI as MDT Secondary Index + participant RI as MDT Record Index + + C->>SI: Batched prefix scan<br/>candidate_paths (N paths) + SI-->>C: Map<path, List<recordKey>> + + loop For each candidate path + alt No record keys returned + Note right of C: Globally orphaned → DELETE + else Has record keys + loop For each record key (short-circuit) + C->>RI: Lookup record key + RI-->>C: (partition, fileId) + alt fileId NOT in cleaned_fg_ids + Note right of C: Live reference found<br/>SHORT-CIRCUIT → RETAIN + end + end + alt All record keys in cleaned FGs + Note right of C: Globally orphaned → DELETE + end + end + end +``` ### Stage 3: Container File Lifecycle @@ -374,16 +439,48 @@ from Stage 1 are sufficient -- no cross-FG check needed for container ranges. └── Transition to COMPLETED ``` -> **DIAGRAM 4: End-to-End Execution Lifecycle** -> -> *Sequence/timeline diagram showing the plan-execute-complete lifecycle.* -> -> Shows: `requestClean()` → compute file slice deletes + blob deletes (Stages 1-3) → -> persist `HoodieCleanerPlan` → **REQUESTED** state on timeline → `runClean()` → transition to -> **INFLIGHT** → delete file slices (parallel) + delete blob files (parallel) → build -> `HoodieCleanMetadata` (including `blobCleanStats`) → transition to **COMPLETED**. Annotate -> crash recovery points: crash before REQUESTED = restart fresh; crash during INFLIGHT = re-execute -> plan (idempotent); crash after delete but before COMPLETED = re-execute (no-op deletes). +```mermaid +sequenceDiagram + participant P as CleanPlanActionExecutor + participant TL as Timeline + participant E as CleanActionExecutor + participant S as Storage + + Note over P: requestClean() + + P->>P: Stage 1: per-FG blob ref set difference + P->>P: Stage 2: MDT index lookup (if external candidates) + P->>P: Stage 3: container lifecycle resolution + P->>TL: Persist HoodieCleanerPlan + + Note over TL: REQUESTED + + rect rgb(255, 245, 230) + Note right of TL: Crash here → restart fresh<br/>(no plan persisted yet) + end + + E->>TL: Transition plan state + Note over TL: INFLIGHT + + rect rgb(255, 245, 230) + Note right of TL: Crash here → re-execute plan<br/>(idempotent: FileNotFound = success) + end + + par Parallel deletion + E->>S: Delete file slices (existing) + and + E->>S: Delete blob files (new) + end + + E->>E: Build HoodieCleanMetadata<br/>(+ blobCleanStats) + E->>TL: Transition plan state + + rect rgb(255, 245, 230) + Note right of TL: Crash here → re-execute<br/>(all deletes are no-ops) + end + + Note over TL: COMPLETED +``` --- @@ -482,23 +579,42 @@ check: metadata reads -- negligible compared to the commit's own I/O. False positives (unnecessary rejections) are rare and handled by existing retry logic. -> **DIAGRAM 5: Writer-Cleaner Concurrency Timeline** -> -> *Timeline diagram showing two parallel timelines (writer and cleaner) with four scenarios.* -> -> **Scenario A:** Writer commits before cleaner plans → cleaner sees the reference in a retained -> slice → blob not deleted. **Safe.** -> -> **Scenario B:** Writer commits after cleaner plans, before cleaner deletes → cleaner is in -> REQUESTED/INFLIGHT state → writer's `preCommit()` reads the plan, finds intersection → -> `HoodieWriteConflictException` → writer retries. **Safe -- conflict detected.** -> -> **Scenario C:** Writer commits after cleaner deletes, before cleaner transitions to COMPLETED → -> cleaner is still INFLIGHT → writer's `preCommit()` reads the INFLIGHT plan → rejection. -> **Safe -- same as B.** -> -> **Scenario D:** Cleaner transitions to COMPLETED, then writer acquires lock → COMPLETED clean -> metadata visible on timeline → writer's check reads `deletedBlobFilePaths` → rejection. **Safe.** +```mermaid +sequenceDiagram + participant W as Writer + participant TL as Timeline + participant CL as Cleaner + + Note over W,CL: Scenario A: Writer commits BEFORE cleaner plans + + W->>TL: Commit (references blob_X) + CL->>TL: Plan cleanup + Note right of CL: Sees blob_X in retained slice → not deleted + Note over W,CL: ✓ Safe + + Note over W,CL: Scenario B: Writer commits AFTER cleaner plans, BEFORE delete + + CL->>TL: Plan cleanup (blob_X in blobFilesToDelete) + Note over TL: REQUESTED / INFLIGHT + W->>TL: preCommit() — reads clean plan + Note left of W: Intersection found!<br/>HoodieWriteConflictException<br/>→ Writer retries + Note over W,CL: ✓ Safe — conflict detected + + Note over W,CL: Scenario C: Writer commits AFTER cleaner deletes, BEFORE COMPLETED + + CL->>CL: Delete blob_X from storage + Note over TL: Still INFLIGHT + W->>TL: preCommit() — reads INFLIGHT plan + Note left of W: blob_X in blobFilesToDelete<br/>→ Rejection + Note over W,CL: ✓ Safe — same as B + + Note over W,CL: Scenario D: Cleaner completes, THEN writer acquires lock + + CL->>TL: Transition to COMPLETED + W->>TL: preCommit() — reads COMPLETED metadata + Note left of W: blob_X in deletedBlobFilePaths<br/>→ Rejection + Note over W,CL: ✓ Safe +``` ### Concurrency Matrix diff --git a/rfc/rfc-100/rfc-100-blob-cleaner.md b/rfc/rfc-100/rfc-100-blob-cleaner.md index 1f81b8c2d242..4e41b596a1aa 100644 --- a/rfc/rfc-100/rfc-100-blob-cleaner.md +++ b/rfc/rfc-100/rfc-100-blob-cleaner.md @@ -161,20 +161,31 @@ Output: hudi_blob_deletes -- blobs safe to delete immediately for each file_group being cleaned: // from refactored CleanPlanner - // --- Collect expired blob refs --- + // --- Collect expired blob refs (base files + log files) --- + // Must read log files from expired slices: blob refs introduced and superseded + // within the log chain before compaction would otherwise become permanent orphans. + // Example: log@t2 adds blob_B, log@t3 supersedes with blob_C, compaction@t4 + // produces base with blob_C. blob_B exists only in expired log@t2. expired_refs = Set<BlobRef>() // BlobRef = (path, offset, length) for slice in expired_slices: - for ref in extractBlobRefs(slice): // base: columnar projection; log: field extraction + for ref in extractBlobRefs(slice.baseFile): // columnar projection + if ref.type == OUT_OF_LINE and ref.managed == true: + expired_refs.add(BlobRef(ref.path, ref.offset, ref.length)) + for ref in extractBlobRefs(slice.logFiles): // full record read if ref.type == OUT_OF_LINE and ref.managed == true: expired_refs.add(BlobRef(ref.path, ref.offset, ref.length)) if expired_refs is empty: continue // no blob work for this FG - // --- Collect retained blob refs --- + // --- Collect retained blob refs (base files only) --- + // Cleaning is fenced on compaction: retained base files contain the merged state. + // Reading only base files may over-retain shadowed refs (a base ref superseded by + // an uncompacted log on top). This is safe -- over-retention is always preferred + // over premature deletion. Shadowed refs are cleaned after the next compaction. retained_refs = Set<BlobRef>() - for slice in retained_slices: // includes base + log files (MOR) - for ref in extractBlobRefs(slice): + for slice in retained_slices: + for ref in extractBlobRefs(slice.baseFile): // columnar projection only if ref.type == OUT_OF_LINE and ref.managed == true: retained_refs.add(BlobRef(ref.path, ref.offset, ref.length)) @@ -193,13 +204,24 @@ for each file_group being cleaned: // from refactored CleanPl the file group that created it. If a blob ref appears in an expired slice but not in any retained slice of the same file group, it is globally orphaned. No cross-FG check is needed. -**Why correct for MOR (C5, R4).** For retained slices in a MOR file group, we extract blob refs -from both the base file and all log files, taking the union. This over-counts: if a log file updates -a record's blob ref, the base file's old ref appears in the retained set even though it is -semantically dead. This is safe -- over-retention prevents the premature deletion that would occur if -the log update were later rolled back. **MOR over-retention is unbounded in duration -- it depends on -compaction frequency.** For tables with infrequent compaction, blob storage waste from MOR -over-retention could be significant. This is a known trade-off: correctness over space efficiency. +**Why correct for MOR (C5, R4).** The read strategy is asymmetric by design: + +- **Expired slices: base + log files.** Log files must be read because blob refs can be introduced + and superseded entirely within the log chain before compaction. Example: `log@t2` writes + `row1→blob_B`, `log@t3` writes `row1→blob_C`, compaction produces a base with `blob_C`. After + compaction, `blob_B` exists only in expired `log@t2`. Skipping log reads would make `blob_B` a + permanent orphan (R2 violation). + +- **Retained slices: base files only.** Cleaning is fenced on compaction, so retained base files + contain the merged state. Any blob ref shadowed by an uncompacted log on top of the retained + slice still appears in the retained set via the base file -- this causes over-retention (safe). + The shadowed ref is cleaned after the next compaction cycle. **Over-retention from this source is + bounded by compaction frequency.** For tables with infrequent compaction, blob storage waste from + over-retention could be significant. This is a known trade-off: correctness over space efficiency. + +This asymmetry also improves performance: retained slices require only columnar base file +projections (cheap), while the more expensive log file reads are confined to expired slices that +are being cleaned anyway. **Why correct for savepoints (C9).** The existing cleaner already excludes savepointed file slices from the expired set (`isFileSliceExistInSavepointedFiles`). Since blob cleanup operates on the same @@ -796,6 +818,8 @@ BlobCleanResult collectBlobRefsForFileGroup(FileGroupCleanResult fgResult, Stora return BlobCleanResult.EMPTY; } + // Expired slices: read base + log files (log files may contain blob refs + // introduced and superseded within the log chain before compaction) Set<BlobRef> expiredRefs = new HashSet<>(); for (FileSlice slice : fgResult.getExpiredSlices()) { extractManagedOutOfLineBlobRefs(slice).forEach(expiredRefs::add); @@ -805,9 +829,11 @@ BlobCleanResult collectBlobRefsForFileGroup(FileGroupCleanResult fgResult, Stora return BlobCleanResult.EMPTY; } + // Retained slices: read base files only (fenced on compaction -- base files + // contain the merged state; skipping logs causes safe over-retention) Set<BlobRef> retainedRefs = new HashSet<>(); for (FileSlice slice : fgResult.getRetainedSlices()) { - extractManagedOutOfLineBlobRefs(slice).forEach(retainedRefs::add); + extractManagedOutOfLineBlobRefsFromBaseFile(slice).forEach(retainedRefs::add); } Set<BlobRef> localOrphans = new HashSet<>(expiredRefs); @@ -1147,12 +1173,19 @@ references. ### C5: MOR log updates shadow base file blob refs -**Satisfied.** Retained blob refs are collected as the union of base file refs and log file refs. -This over-counts -- a shadowed base ref appears live even though the log update superseded it. This -is safe: over-retention prevents premature deletion. After compaction merges the log into a new base -file, the superseded ref disappears, and the next clean cycle deletes the orphaned blob. **Note:** -MOR over-retention is unbounded in duration and depends on compaction frequency. This is a known -trade-off explicitly accepted for correctness. +**Satisfied.** The read strategy is asymmetric: + +- *Expired slices* read base + log files: log files must be read because blob refs introduced and + superseded within the log chain before compaction exist only in expired logs. Skipping them would + create permanent orphans (R2 violation). +- *Retained slices* read base files only: cleaning is fenced on compaction, so retained base files + contain the merged state. Shadowed base refs (superseded by uncompacted logs) appear in the + retained set, causing safe over-retention. After the next compaction, the superseded ref + disappears and the next clean cycle deletes the orphaned blob. + +**Note:** Over-retention from shadowed retained refs is bounded by compaction frequency. For tables +with infrequent compaction, blob storage waste could be significant. This is a known trade-off +explicitly accepted for correctness. ### C6: Existing cleaner is per-file-group scoped @@ -1709,16 +1742,19 @@ table scan path until the index is fully built. ### 10.1 Cost model for Stage 1 -For each cleaned file group, Stage 1 reads blob ref fields from expired and retained slices. +For each cleaned file group, Stage 1 reads blob ref fields from expired and retained slices. The +read strategy is asymmetric: expired slices read base + log files; retained slices read base files +only (cleaning is fenced on compaction, so retained base files contain the merged state). **Base files (Parquet):** Columnar projection reads only the blob ref struct columns. Cost per base file: one Parquet column chunk read. Typical size: 100 bytes/record * 500K records = ~50MB per slice for the blob ref column. -**Log files (MOR):** Log files are not columnar. Reading blob refs requires reading full log records -and extracting the blob ref field. Cost per log file: proportional to log file size (full scan), -not just the blob ref column. For a 100MB log file, the entire 100MB is read even though only ~50MB -is blob ref data. This is 2x the cost of a base file projection. +**Log files (MOR, expired slices only):** Log files are not columnar. Reading blob refs requires +reading full log records and extracting the blob ref field. Cost per log file: proportional to log +file size (full scan), not just the blob ref column. For a 100MB log file, the entire 100MB is read +even though only ~50MB is blob ref data. This is 2x the cost of a base file projection. However, +log reads are only needed for expired slices -- retained slices skip log reads entirely. | Parameter | Base file (Parquet) | Log file (MOR) | |--------------------------|---------------------|----------------------| @@ -1727,13 +1763,14 @@ is blob ref data. This is 2x the cost of a base file projection. | Parallelizable | Yes | Yes | | Records per slice | ~500K | ~500K (worst case) | | Blob ref size per record | ~100 bytes | ~100 bytes | +| Used for | Expired + retained | Expired only | **Total cost per FG:** -| Table type | Retained slices | Expired slices | Reads per FG | Data per FG | -|------------|-----------------|----------------|-----------------|-------------| -| COW | 3-5 base | 1-3 base | 4-8 | 200-400MB | -| MOR | 3-5 (base+log) | 1-3 (base+log) | 4-8 base + logs | 200MB-1GB | +| Table type | Retained slices | Expired slices | Reads per FG | Data per FG | +|------------|----------------------|----------------|---------------------|-------------| +| COW | 3-5 base | 1-3 base | 4-8 base | 200-400MB | +| MOR | 3-5 base (logs skip) | 1-3 (base+log) | 3-5 base + 1-3 b+l | 150MB-600MB | **Memory budget analysis (addresses finding 3.9):** @@ -2006,11 +2043,29 @@ function isHudiCreatedBlob(blobPath, blobPrefix): return blobPath.startsWith(blobPrefix) +function extractManagedOutOfLineBlobRefsFromBaseFile(slice): + """ + Extract managed, out-of-line blob refs from a file slice's base file only. + Uses columnar projection on the blob ref struct columns. + Used for retained slices (cleaning is fenced on compaction, so base files + contain the merged state). + Returns Stream<BlobRef>. + """ + refs = Stream.empty() + if slice.getBaseFile().isPresent(): + refs = projectBlobRefColumnsFromParquet(slice.getBaseFile().get()) + .filter(r -> r.type == OUT_OF_LINE && r.managed == true) + .map(r -> BlobRef(r.path, r.offset, r.length)) + return refs + + function extractManagedOutOfLineBlobRefs(slice): """ - Extract managed, out-of-line blob refs from a file slice. + Extract managed, out-of-line blob refs from a file slice (base + log files). For base files: columnar projection on the blob ref struct columns. For log files: full record read with field extraction. + Used for expired slices (log files must be read to find blob refs introduced + and superseded within the log chain before compaction). Returns Stream<BlobRef>. """ refs = Stream.empty() @@ -2021,7 +2076,7 @@ function extractManagedOutOfLineBlobRefs(slice): .filter(r -> r.type == OUT_OF_LINE && r.managed == true) .map(r -> BlobRef(r.path, r.offset, r.length))) - // Log files (MOR only) + // Log files (MOR only -- required for expired slices to avoid permanent orphans) for logFile in slice.getLogFiles(): refs = concat(refs, extractBlobRefFieldFromLogFile(logFile) .filter(r -> r.type == OUT_OF_LINE && r.managed == true)
