This is an automated email from the ASF dual-hosted git repository.

vhs pushed a commit to branch rfc-blob-cleaner
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 574df5bc244e94d2a1f0055537836f9486193ff4
Author: voon <[email protected]>
AuthorDate: Fri Mar 20 16:34:29 2026 +0800

    Update rollout plan
---
 rfc/rfc-100/rfc-100-blob-cleaner-design.md | 78 +++++++++++++++++++-----------
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/rfc/rfc-100/rfc-100-blob-cleaner-design.md 
b/rfc/rfc-100/rfc-100-blob-cleaner-design.md
index a7f80e14bcad..e94d3ad4eafb 100644
--- a/rfc/rfc-100/rfc-100-blob-cleaner-design.md
+++ b/rfc/rfc-100/rfc-100-blob-cleaner-design.md
@@ -142,6 +142,20 @@ MDT secondary index. The dispatch mechanism is a zero-cost 
string prefix check o
 | **Stage 2** | Cross-file-group     | Verify external blob candidates against 
MDT secondary index or fallback scan     | Only when external candidates exist  
  |
 | **Stage 3** | Container resolution | Determine delete vs. 
flag-for-compaction at the container level                  | Only when 
container blobs are involved |
 
+### Independent Implementability
+
+The three stages have clean input/output interfaces and can be implemented, 
tested, and shipped
+independently:
+
+| Stage   | Input                                                   | Output   
                                           |
+|---------|---------------------------------------------------------|-----------------------------------------------------|
+| Stage 1 | `FileGroupCleanResult` (expired + retained slices)      | 
`hudi_blob_deletes`, `external_candidates`          |
+| Stage 2 | `external_candidates`, `cleaned_fg_ids`                 | 
`external_deletes`                                  |
+| Stage 3 | `hudi_blob_deletes` + `external_deletes`, retained refs | 
`blob_files_to_delete`, `containers_for_compaction` |
+
+A shared foundation layer must land first (see [Rollout / Adoption 
Plan](#rollout--adoption-plan)), after which stages
+can proceed in any order.
+
 ### Key Decisions
 
 | Decision            | Choice                                                 
 | Rationale                                                          |
@@ -337,11 +351,11 @@ for path in candidate_paths:
 resolution with per-candidate short-circuit. Steps 1 and 2 are each a single 
I/O pass; step 3 is
 pure in-memory hash set lookups (~0ms).
 
-| Step                        | I/O                                       | 
Cost                     |
-|-----------------------------|-----------------------------------------|--------------------------|
-| 1. Prefix scan (batched)    | Single MDT call for N candidate paths    | 
~2-5s for 2K candidates  |
-| 2. Record index (batched)   | Single sorted HFile forward-scan         | 
~1-2s for 6K record keys |
-| 3. In-memory resolution     | Hash set checks (cleaned_fg_ids)         | 
~0ms                     |
+| Step                      | I/O                                   | Cost     
                |
+|---------------------------|---------------------------------------|--------------------------|
+| 1. Prefix scan (batched)  | Single MDT call for N candidate paths | ~2-5s 
for 2K candidates  |
+| 2. Record index (batched) | Single sorted HFile forward-scan      | ~1-2s 
for 6K record keys |
+| 3. In-memory resolution   | Hash set checks (cleaned_fg_ids)      | ~0ms     
                |
 
 **Index definition.** Uses the existing `HoodieIndexDefinition` mechanism with
 `sourceFields = ["<blob_col>", "reference", "external_path"]`. The nested 
field path is supported
@@ -667,20 +681,20 @@ cleaning:
 
 ### Back-of-Envelope: Example 7 (50K FGs, 2K External Candidates)
 
-| Parameter                               | Value     | Notes                  
                        |
-|-----------------------------------------|-----------|-------------------------------------------------|
-| FGs cleaned this cycle                  | 500       | 1% of table            
                        |
-| Stage 1: reads per FG                   | ~6        | 3 retained + 3 expired 
slices                  |
-| Stage 1: total reads                    | 3,000     | Parallelized across 
executors, ~20s            |
-| External blob candidates                | 2,000     | Locally orphaned in 
cleaned FGs                |
-| Avg refs per candidate                  | 3         | Typical: video in a 
few playlists              |
-| Total record keys                       | 6,000     | 2,000 * 3              
                       |
-| **Stage 2 cost**                        |           |                        
                        |
-| Step 1: batched prefix scan             | 1 call    | Returns 6K record 
keys, ~2-5s                  |
-| Step 2: batched record index lookup     | 1 call    | 6K keys sorted, single 
HFile scan, ~1-2s       |
-| Step 3: in-memory resolution            | 6K checks | Hash set lookups 
against cleaned_fg_ids, ~0ms  |
-| **Total Stage 2**                       | **~3-7s** |                        
                        |
-| Comparison: naive full-table scan       | 12.5TB    | 50K FGs * 5 slices * 
50MB = prohibitive        |
+| Parameter                           | Value     | Notes                      
                   |
+|-------------------------------------|-----------|-----------------------------------------------|
+| FGs cleaned this cycle              | 500       | 1% of table                
                   |
+| Stage 1: reads per FG               | ~6        | 3 retained + 3 expired 
slices                 |
+| Stage 1: total reads                | 3,000     | Parallelized across 
executors, ~20s           |
+| External blob candidates            | 2,000     | Locally orphaned in 
cleaned FGs               |
+| Avg refs per candidate              | 3         | Typical: video in a few 
playlists             |
+| Total record keys                   | 6,000     | 2,000 * 3                  
                   |
+| **Stage 2 cost**                    |           |                            
                   |
+| Step 1: batched prefix scan         | 1 call    | Returns 6K record keys, 
~2-5s                 |
+| Step 2: batched record index lookup | 1 call    | 6K keys sorted, single 
HFile scan, ~1-2s      |
+| Step 3: in-memory resolution        | 6K checks | Hash set lookups against 
cleaned_fg_ids, ~0ms |
+| **Total Stage 2**                   | **~3-7s** |                            
                   |
+| Comparison: naive full-table scan   | 12.5TB    | 50K FGs * 5 slices * 50MB 
= prohibitive       |
 
 ### Memory Budget
 
@@ -709,18 +723,26 @@ Sections 10.1-10.3.
 
 ## Rollout / Adoption Plan
 
-### Phase 1: Flow 1 Only (Hudi-Created Blobs)
+Each stage can be implemented, tested, and shipped independently once the 
foundation layer is in
+place (see [Independent Implementability](#independent-implementability)).
+
+**Foundation (shared prerequisite).** `CleanPlanner` refactoring (policy 
methods return
+`FileGroupCleanResult`), `BlobRef` type, schema changes (nullable 
`blobFilesToDelete` and
+`containersToCompact` fields), and the `hasBlobColumns` zero-cost gate.
+
+**Stage 1 (per-FG cleanup).** Set-difference logic and dispatch by blob 
category. Produces
+`hudi_blob_deletes` (immediate) and `external_candidates` (for Stage 2).
 
-- Requires no new dependencies (no MDT secondary index, no record index).
-- `CleanPlanner` refactoring + Stage 1 + Stage 3.
-- Tables with only Hudi-created blobs get full cleanup.
-- Non-blob tables are completely unaffected (zero-cost gate).
+**Stage 2 (cross-FG verification) -- priority.** Flow 2 (external blobs) is 
the primary initial
+use case -- cross-FG verification prevents premature deletion of shared blobs. 
Requires MDT +
+record index + secondary index on `reference.external_path` (P6). Includes 
fallback table scan
+with circuit breaker.
 
-### Phase 2: Flow 2 (External Blobs)
+**Stage 3 (container lifecycle).** Delete-entire-file vs. flag-for-compaction 
at the container
+level. Needed only when container files are used.
 
-- Requires MDT + record index + secondary index on `reference.external_path` 
(P6).
-- Stage 2 (MDT secondary index path) + fallback table scan with circuit 
breaker.
-- Writer-side conflict check in `preCommit()` for external blob concurrency 
safety.
+**Writer-side conflict check.** `preCommit()` conflict check for Flow 2 
concurrency safety.
+Closes the writer-cleaner race window. Independent of the three stages.
 
 ### Backward Compatibility
 

Reply via email to