This is an automated email from the ASF dual-hosted git repository.
vhs pushed a commit to branch rfc-blob-cleaner
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/rfc-blob-cleaner by this push:
new 0cc44ece94a8 Update F8, F9 references.
0cc44ece94a8 is described below
commit 0cc44ece94a861885afbb2b871e18d1aa5859b60
Author: voon <[email protected]>
AuthorDate: Fri Mar 20 17:09:16 2026 +0800
Update F8, F9 references.
---
rfc/rfc-100/rfc-100-blob-cleaner-design.md | 4 ++--
rfc/rfc-100/rfc-100-blob-cleaner-problem.md | 2 +-
rfc/rfc-100/rfc-100-blob-cleaner.md | 12 ++++++------
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/rfc/rfc-100/rfc-100-blob-cleaner-design.md
b/rfc/rfc-100/rfc-100-blob-cleaner-design.md
index 54b50fbe98eb..db65d648d498 100644
--- a/rfc/rfc-100/rfc-100-blob-cleaner-design.md
+++ b/rfc/rfc-100/rfc-100-blob-cleaner-design.md
@@ -281,8 +281,8 @@ expired set. Blob cleanup inherits this: savepointed slices
are always in the re
**Why correct for replaced file groups (clustering).** For replaced FGs,
`retained_slices` is empty
and `expired_slices` is all slices. For Hudi-created blobs, all are safe to
delete (clustering
-creates new blobs in the target FG via F8). For external blobs, all flow to
Stage 2 for cross-FG
-verification (clustering copies the pointer via F9, so Stage 2 finds the
reference in the target FG).
+creates new blob files in the target FG). For external blobs, all flow to
Stage 2 for cross-FG
+verification (clustering copies the pointer to the target FG, so Stage 2 finds
the reference there).
```mermaid
flowchart TD
diff --git a/rfc/rfc-100/rfc-100-blob-cleaner-problem.md
b/rfc/rfc-100/rfc-100-blob-cleaner-problem.md
index b5af398b9363..61b2f79f5e16 100644
--- a/rfc/rfc-100/rfc-100-blob-cleaner-problem.md
+++ b/rfc/rfc-100/rfc-100-blob-cleaner-problem.md
@@ -199,7 +199,7 @@ clustering, the source file group's slices still reference
the original blobs un
cleaned. The target file group's slices reference either new blobs
(Hudi-managed) or the same
external blobs.
-*Source: RFC-100 lines 212-214; alternatives analysis F8, F9.*
+*Source: RFC-100 lines 212-214.*
### C9: Savepoints freeze file slices and their blob refs
diff --git a/rfc/rfc-100/rfc-100-blob-cleaner.md
b/rfc/rfc-100/rfc-100-blob-cleaner.md
index 1392333a08d1..6365f6807d4d 100644
--- a/rfc/rfc-100/rfc-100-blob-cleaner.md
+++ b/rfc/rfc-100/rfc-100-blob-cleaner.md
@@ -89,7 +89,7 @@ commits (C11). This is specified in RFC-100 line 170.
**P3: Hudi-created blob files are scoped to a single file group.** A
Hudi-created blob is written as
part of a specific file group's commit. No other file group's write path
produces a reference to that
-blob. After clustering, the target file group creates *new* blob files
(C8/F8). After clustering
+blob. After clustering, the target file group creates *new* blob files (C8).
After clustering
completes, references in both source and target FGs may temporarily coexist,
but the source FG's
references are to the *original* blobs, and the target FG's references are to
*new* blobs. Both sets
are scoped to their respective file groups.
@@ -231,7 +231,7 @@ remain in the retained set. No additional logic needed.
**Why correct for replaced file groups.** See Section 4.2 for how replaced
file groups (cleaned via
`getReplacedFilesEligibleToClean()`) integrate with Stage 1. For replaced FGs,
`retained_slices` is
empty and `expired_slices` is all slices in the FG. Every blob ref is an
orphan candidate within the
-FG. For Hudi-created blobs, this is correct (P3 -- the target FG created new
blobs via F8). For
+FG. For Hudi-created blobs, this is correct (P3 -- clustering created new
blobs in the target FG). For
external blobs, every candidate flows to Stage 2 for cross-FG verification.
### 3.2 Stage 2: Cross-FG Verification for External Blobs
@@ -808,9 +808,9 @@ The key properties for replaced FGs:
1. `retainedSlices` is always empty -- the entire FG is being retired.
2. `expiredSlices` is all (non-savepointed) slices in the FG.
3. For Hudi-created blobs: `local_orphans = expired_refs - {} = expired_refs`.
All are safe to
- delete because clustering created new blobs in the target FG (F8, P3).
+ delete because clustering created new blobs in the target FG (P3).
4. For external blobs: `local_orphans = expired_refs`. All flow to
`external_candidates` for Stage
- 2 verification. Clustering copied the pointer to the target FG (F9), so
Stage 2 will find the
+ 2 verification. Clustering copied the pointer to the target FG, so Stage 2
will find the
reference in the target FG and prevent deletion.
### 4.3 Where the code lives
@@ -1215,10 +1215,10 @@ data table.
### C8: Clustering moves blob refs between file groups
-**Satisfied.** For Hudi-created blobs, clustering creates new blob files in
the target FG (F8).
+**Satisfied.** For Hudi-created blobs, clustering creates new blob files in
the target FG.
After clustering, the source FG's old slices reference the original blobs, and
these become orphaned
when the source FG is cleaned. Stage 1 correctly identifies them as locally
orphaned. For external
-blobs, clustering copies the pointer (F9), creating cross-FG references. Stage
2 handles this: the
+blobs, clustering copies the pointer, creating cross-FG references. Stage 2
handles this: the
external blob is not deleted until no active slice in any FG references it.
The replaced FG
integration (Section 4.2) ensures this flows through the correct code path.