suryaprasanna opened a new issue, #19054:
URL: https://github.com/apache/hudi/issues/19054

   ### Feature Description
   
   ## What the feature achieves
   
   Decide **per file group, at ingestion time**, whether to append a delta log 
file or write a new base file directly — turning the table into a 
**per-file-group hybrid of Copy-on-Write and Merge-on-Read**:
   
   - **Cold file groups** (sparse updates, e.g. older partitions) → **log 
files** (normal MoR).
   - **Hot file groups** (heavy updates, e.g. recent partitions) → **write a 
new base file directly**, skipping the log-then-immediately-compact round trip.
   
   The routing decision is **workload-aware**: it reuses the per-file-group 
statistics Hudi already computes during workload profiling (insert/update 
counts, current file group size, estimated batch size), so no extra scan is 
required to make the call.
   
   **Benefits:**
   - No separate compaction step for hot file groups.
   - Data is materialized into a base file **at delta-commit time** → freshness 
preserved, no compaction-lag window, no async compaction backlog.
   - Write amplification is only paid where it's unavoidable (the genuinely hot 
groups); cold groups stay cheap and log-based.
   - Cheaper and more consistent snapshot reads — hot groups have a fresh base 
file with few/no logs to merge, cold groups merge only small logs.
   
   ## Why this feature is needed
   
   Our critical datasets are **Copy-on-Write**, so even tiny updates rewrite 
the *entire* base file for a file group → heavy **write amplification**.
   
   Moving to **Merge-on-Read** would fix the write cost, but introduces new 
problems for this workload:
   - **Snapshot reads merge base + log on the fly** → slower, costlier queries.
   - **Freshness is the top priority** — these tables feed many downstream 
datasets, so we can't tolerate a staleness window waiting on compaction.
   
   **Key pattern:** updates are highly skewed — a small set of file groups in 
recent partitions take most of the updates, while older partitions barely 
change. This skew is exactly what makes a per-file-group decision worthwhile.
   
   ### Options we ruled out
   
   | Approach | Why it fails |
   |---|---|
   | MoR + Read-Optimized view + aggressive compaction | Compaction rewrites 
base files so often we're back to CoW-level write amplification. |
   | MoR + compact only hot file groups + Read-Optimized view | RO view becomes 
**inconsistent**: uncompacted file groups silently drop their recent updates. |
   | MoR + compact only hot file groups + **snapshot view** | Works, but still 
needs a **separate compaction step** right after ingestion, which can delay 
freshness further — queries that start before compaction finishes take the 
merge hit. |
   
   The proposed feature removes that residual compaction step entirely by 
materializing the
   hot-group updates as base files during the delta commit itself.
   
   ### User Experience
   
   **How users will use this feature:**
   Users **opt in** on a per-table basis to **workload-aware updates** on a MoR 
table. Once enabled, ingestion automatically routes heavy-update ("hot") file 
groups to a direct base-file write while leaving cold file groups as normal log 
appends — no manual per-partition compaction tuning required. Existing readers 
continue to use the snapshot/real-time view and simply see fresher, 
cheaper-to-read data for the hot groups.
   
   **Configuration changes needed:** _(work in progress)_
   - An opt-in toggle to enable workload-aware updates on a MoR table.
   - A "hot file group" threshold (e.g. estimated log size vs. base file size, 
or update
     count/ratio per file group).
   - An optional cap on base files rewritten per commit, to bound ingestion 
latency.
    
   **API changes:** _(work in progress)_
   **Usage examples:** _(work in progress)_
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (N/A for now)
   
   **Why RFC is/isn't needed:**
   - Does this change public interfaces/APIs?**Yes** — adds new (opt-in) write 
configuration properties; no breaking changes to existing APIs.
   - Does this change storage format? **No new file formats** — base and log 
files remain standard. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to