(hudi) branch master updated: docs: RFC-106 - Record Level and Secondary Index Support for Flink Writers (#17610)

danny0405 Wed, 10 Jun 2026 21:57:33 -0700

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/master by this push:
     new 84c1f591481d docs: RFC-106 - Record Level and Secondary Index Support 
for Flink Writers (#17610)
84c1f591481d is described below

commit 84c1f591481d41bc8ffac313e5ae992db19ed3d5
Author: Danny Chan <[email protected]>
AuthorDate: Thu Jun 11 12:56:25 2026 +0800

    docs: RFC-106 - Record Level and Secondary Index Support for Flink Writers 
(#17610)
---
 rfc/README.md                         | 216 ++++++++++++++--------------
 rfc/rfc-106/index-compaction-flow.png | Bin 0 -> 329243 bytes
 rfc/rfc-106/index-write-flow.png      | Bin 0 -> 375082 bytes
 rfc/rfc-106/rfc-106.md                | 260 ++++++++++++++++++++++++++++++++++
 rfc/rfc-106/rli-access-pattern.png    | Bin 0 -> 96912 bytes
 5 files changed, 368 insertions(+), 108 deletions(-)

diff --git a/rfc/README.md b/rfc/README.md
index d9ec64a22dfd..4fb3de2cb8a0 100644
--- a/rfc/README.md
+++ b/rfc/README.md
@@ -34,111 +34,111 @@ The list of all RFCs can be found here.
 
 > Older RFC content is still 
 > [here](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process).
 
-| RFC Number   | Title                                                         
                                                                                
                                                                         | 
Status                              |
-| ------------ | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 | ----------------------------------- |
-| 1            | [CSV Source Support for Delta 
Streamer](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+01+%3A+CSV+Source+Support+for+Delta+Streamer)
                                                                           | 
:white_check_mark: `COMPLETED`      |
-| 2            | [ORC Storage in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708439)
                                                                                
                                      | :white_check_mark: `COMPLETED`      |
-| 3            | [Timeline Service with Incremental File System View 
Syncing](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708965)
                                                                               
| :white_check_mark: `COMPLETED`      |
-| 4            | [Faster Hive incremental pull 
queries](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115513622)
                                                                                
                     | :white_check_mark: `COMPLETED`      |
-| 5            | [HUI (Hudi 
WebUI)](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233)
                                                                                
                                         | :x: `ABANDONED`                     |
-| 6            | [Add indexing support to the log 
file](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+06+%3A+Add+indexing+support+to+the+log+file)
                                                                             | 
:x: `ABANDONED`                     |
-| 7            | [Point in time Time-Travel queries on Hudi 
table](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table)
                                                       | :white_check_mark: 
`COMPLETED`      |
-| 8            | [Metadata based Record Index](./rfc-8/rfc-8.md)               
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 9            | [Hudi Dataset Snapshot 
Exporter](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter)
                                                                                
         | :white_check_mark: `COMPLETED`      |
-| 10           | [Restructuring and auto-generation of 
docs](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs)
                                                                   | 
:white_check_mark: `COMPLETED`      |
-| 11           | [Refactor of the configuration framework of hudi 
project](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+11+%3A+Refactor+of+the+configuration+framework+of+hudi+project)
                                       | :x: `ABANDONED`                     |
-| 12           | [Efficient Migration of Large Parquet Tables to Apache 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi)
                                 | :white_check_mark: `COMPLETED`      |
-| 13           | [Integrate Hudi with 
Flink](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520)
                                                                                
                                | :white_check_mark: `COMPLETED`      |
-| 14           | [JDBC incremental 
puller](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller)
                                                                                
                       | :white_check_mark: `COMPLETED`      |
-| 15           | [HUDI File Listing 
Improvements](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)
                                                                                
          | :white_check_mark: `COMPLETED`      |
-| 16           | [Abstraction for HoodieInputFormat and 
RecordReader](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+16+Abstraction+for+HoodieInputFormat+and+RecordReader)
                                                     | :white_check_mark: 
`COMPLETED`      |
-| 17           | [Abstract common meta sync module support multiple meta 
service](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+17+Abstract+common+meta+sync+module+support+multiple+meta+service)
                             | :white_check_mark: `COMPLETED`      |
-| 18           | [Insert Overwrite 
API](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+18+Insert+Overwrite+API)
                                                                                
                                 | :white_check_mark: `COMPLETED`      |
-| 19           | [Clustering data for freshness and query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance)
                                                   | :white_check_mark: 
`COMPLETED`      |
-| 20           | [handle failed 
records](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records)
                                                                                
                           | :arrows_counterclockwise: `ONGOING` |
-| 21           | [Allow HoodieRecordKey to be 
Virtual](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual)
                                                                               
| :white_check_mark: `COMPLETED`      |
-| 22           | [Snapshot Isolation using Optimistic Concurrency Control for 
multi-writers](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers)
   | :white_check_mark: `COMPLETED`      |
-| 23           | [Hudi Observability metrics 
collection](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+23+%3A+Hudi+Observability+metrics+collection)
                                                                           | 
:x: `ABANDONED`                     |
-| 24           | [Hoodie Flink Writer 
Proposal](https://cwiki.apache.org/confluence/display/HUDI/RFC-24%3A+Hoodie+Flink+Writer+Proposal)
                                                                                
                | :white_check_mark: `COMPLETED`      |
-| 25           | [Spark SQL Extension For 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
                                                                                
              | :white_check_mark: `COMPLETED`      |
-| 26           | [Optimization For Hudi Table 
Query](https://cwiki.apache.org/confluence/display/HUDI/RFC-26+Optimization+For+Hudi+Table+Query)
                                                                                
         | :white_check_mark: `COMPLETED`      |
-| 27           | [Data skipping index to improve query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance)
                                                           | :white_check_mark: 
`COMPLETED`      |
-| 28           | [Support Z-order 
curve](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181307144)
                                                                                
                                    | :white_check_mark: `COMPLETED`      |
-| 29           | [Hash 
Index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index) 
                                                                                
                                                 | :white_check_mark: 
`COMPLETED`      |
-| 30           | [Batch 
operation](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+30%3A+Batch+operation)
                                                                                
                                        | :x: `ABANDONED`                     |
-| 31           | [Hive integration 
Improvement](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment)
                                                                                
               | :x: `ABANDONED`                     |
-| 32           | [Kafka Connect Sink for 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC-32+Kafka+Connect+Sink+for+Hudi)
                                                                                
                     | :arrows_counterclockwise: `ONGOING` |
-| 33           | [Hudi supports more comprehensive Schema 
Evolution](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution)
                                                      | :white_check_mark: 
`COMPLETED`      |
-| 34           | [Hudi BigQuery Integration](./rfc-34/rfc-34.md)               
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 35           | [Make Flink MOR table writing streaming 
friendly](https://cwiki.apache.org/confluence/display/HUDI/RFC-35%3A+Make+Flink+MOR+table+writing+streaming+friendly)
                                                          | :white_check_mark: 
`COMPLETED`      |
-| 36           | [HUDI Metastore 
Server](https://cwiki.apache.org/confluence/display/HUDI/%5BWIP%5D+RFC-36%3A+HUDI+Metastore+Server)
                                                                                
                    | :arrows_counterclockwise: `ONGOING` |
-| 37           | [Hudi Metadata based Bloom Index](rfc-37/rfc-37.md)           
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 38           | [Spark Datasource V2 Integration](./rfc-38/rfc-38.md)         
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 39           | [Incremental source for Debezium](./rfc-39/rfc-39.md)         
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 40           | [Connector for Trino](./rfc-40/rfc-40.md)                     
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 41           | [Snowflake Integration](./rfc-41/rfc-41.md), supported via 
[Apache XTable (Incubating)](https://xtable.apache.org/)                        
                                                                            | 
:x: `ABANDONED`                     |
-| 42           | [Consistent Hashing Index](./rfc-42/rfc-42.md)                
                                                                                
                                                                         | 
:arrows_counterclockwise: `ONGOING` |
-| 43           | [Table Management Service](./rfc-43/rfc-43.md)                
                                                                                
                                                                         | :x: 
`ABANDONED`                     |
-| 44           | [Hudi Connector for Presto](./rfc-44/rfc-44.md)               
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 45           | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md)          
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 46           | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md)      
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 47           | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md)  
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 48           | [LogCompaction for MOR tables](./rfc-48/rfc-48.md)            
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 49           | [Support sync with DataHub](./rfc-49/rfc-49.md)               
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 50           | [Improve Timeline Server](./rfc-50/rfc-50.md)                 
                                                                                
                                                                         | :x: 
`ABANDONED`                     |
-| 51           | [Change Data Capture](./rfc-51/rfc-51.md)                     
                                                                                
                                                                         | 
:arrows_counterclockwise: `ONGOING` |
-| 52           | [Introduce Secondary Index to Improve HUDI Query 
Performance](./rfc-52/rfc-52.md)                                                
                                                                                
      | :x: `ABANDONED`                     |
-| 53           | [Use Lock-Free Message Queue Improving Hoodie Writing 
Efficiency](./rfc-53/rfc-53.md)                                                 
                                                                                
 | :white_check_mark: `COMPLETED`      |
-| 54           | [New Table APIs and Streamline Hudi 
Configs](./rfc-54/rfc-54.md)                                                    
                                                                                
                   | :x: `ABANDONED`                     |
-| 55           | [Improve Hive/Meta sync class design and 
hierarchies](./rfc-55/rfc-55.md)                                                
                                                                                
              | :white_check_mark: `COMPLETED`      |
-| 56           | [Early Conflict Detection For 
Multi-Writer](./rfc-56/rfc-56.md)                                               
                                                                                
                         | :white_check_mark: `COMPLETED`      |
-| 57           | [DeltaStreamer Protobuf Support](./rfc-57/rfc-57.md)          
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 58           | [Integrate column stats index with all query 
engines](./rfc-58/rfc-58.md)                                                    
                                                                                
          | :white_check_mark: `COMPLETED`      |
-| 59           | [Multiple event_time Fields Latest Verification in a Single 
Table](./rfc-59/rfc-59.md)                                                      
                                                                           | 
:eyes: `UNDER REVIEW`               |
-| 60           | [Federated Storage Layer](./rfc-60/rfc-60.md)                 
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 61           | [Snapshot view management](./rfc-61/rfc-61.md)                
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 62           | [Diagnostic Reporter](./rfc-62/rfc-62.md)                     
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 63           | [Expression Indexes](./rfc-63/rfc-63.md)                      
                                                                                
                                                                         | 
:arrows_counterclockwise: `ONGOING` |
-| 64           | [New Hudi Table Spec API for Query 
Integrations](./rfc-64/rfc-64.md)                                               
                                                                                
                    | :eyes: `UNDER REVIEW`               |
-| 65           | [Partition TTL Management](./rfc-65/rfc-65.md)                
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 66           | [Non Blocking Concurrency Control](./rfc-66/rfc-66.md)        
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 67           | [Hudi Bundle Standards](./rfc-67/rfc-67.md)                   
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 68           | [A More Effective HoodieMergeHandler for COW Table with 
Parquet](./rfc-68/rfc-68.md)                                                    
                                                                               
| :x: `ABANDONED`                     |
-| 69           | [Hudi 1.x](./rfc-69/rfc-69.md)                                
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 70           | [Hudi Reverse Streamer](./rfc/rfc-70/rfc-70.md)               
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 71           | [Enhance OCC conflict detection](./rfc/rfc-71/rfc-71.md)      
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 72           | [Redesign Hudi-Spark Integration](./rfc/rfc-72/rfc-72.md)     
                                                                                
                                                                         | 
:arrows_counterclockwise: `ONGOING` |
-| 73           | [Multi-Table Transactions](./rfc-73/rfc-73.md)                
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 74           | [`HoodieStorage`: Hudi Storage Abstraction and 
APIs](./rfc-74/rfc-74.md)                                                       
                                                                                
        | :arrows_counterclockwise: `ONGOING` |
-| 75           | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)     
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 76           | [Auto Record key generation](./rfc-76/rfc-76.md)              
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 77           | [Secondary Index](./rfc-77/rfc-77.md)                         
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 78           | [1.0 Migration](./rfc-78/rfc-78.md)                           
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 79           | [Robust handling of spark task retries and 
failures](./rfc-79/rfc-79.md)                                                   
                                                                                
            | :x: `ABANDONED`                     |
-| 80           | [Column Groups](./rfc-80/rfc-80.md)                           
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 81           | [Introduce Primary Key Sorted Table](./rfc-81/rfc-81.md)      
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 82           | [Concurrent schema evolution detection](./rfc-82/rfc-82.md)   
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 83           | [Incremental Table Service](./rfc-83/rfc-83.md)               
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 84           | [Optimized SerDe of `DataStream` in Flink 
operators](./rfc-84/rfc-84.md)                                                  
                                                                                
             | :white_check_mark: `COMPLETED`      |
-| 85           | [Hudi Issue and Sprint Management in 
Jira](./rfc-85/rfc-85.md)                                                       
                                                                                
                  | :white_check_mark: `COMPLETED`      |
-| 86           | [DataFrame Implementation of HUDI write 
path](./rfc-86/rfc-86.md)                                                       
                                                                                
               | :eyes: `UNDER REVIEW`               |
-| 87           | [Avro elimination for Flink writer](./rfc-87/rfc-87.md)       
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 88           | [New Schema/DataType/Expression 
Abstractions](./rfc-88/rfc-88.md)                                               
                                                                                
                       | :eyes: `UNDER REVIEW`               |
-| 89           | [Dynamic Partition Level Bucket Index](./rfc-89/rfc-89.md)    
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 90           | Add support for cancellable clustering table service plans    
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 91           | [Storage-based lock provider using conditional 
writes](./rfc-91/rfc-91.md)                                                     
                                                                                
        | :white_check_mark: `COMPLETED`      |
-| 92           | Support Bitmap Index                                          
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 93           | [Pluggable Table Formats in Hudi](./rfc-93/rfc-93.md)         
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 94           | Hudi Timeline User Interface (UI)                             
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 95           | [Hudi Flink Source Based on FLIP-27](./rfc-95/rfc-95.md)      
                                                                                
                                                                         | 
:white_check_mark: `COMPLETED`      |
-| 96           | Introduce Unified Bucket Index                                
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 97           | Deprecate Hudi Payload Class Usage                            
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 98           | [Spark Datasource V2 Read](./rfc-98/rfc-98.md)                
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 99           | [Hudi Type System Redesign](./rfc-99/rfc-99.md)               
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 100          | [Unstructured Data Storage in Hudi](./rfc-100/rfc-100.md)     
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 101          | [Updates to the HoodieRecordMerger API](./rfc-101/rfc-101.md) 
                                                                                
                                                                         | 
:hammer_and_wrench: `IN PROGRESS`   |
-| 102          | [Spark Batch Vector Search in Apache 
Hudi](./rfc-102/rfc-102/md)                                                     
                                                                                
                  | :white_check_mark: `COMPLETED`      |
-| 103          | Hudi LSM tree layout                                          
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
-| 104          | [Unify schema evolution on 
schema-on-read](./rfc-104/rfc-104.md)                                           
                                                                                
                            | :eyes: `UNDER REVIEW`               |
-| 105          | [Trino Hudi Connector — Shim/Bundle 
Refactor](./rfc-105/rfc-105.md)                                                 
                                                                                
                   | :eyes: `UNDER REVIEW`               |
-| 106          | Flink RLI for streaming                                       
                                                                                
                                                                         | 
:eyes: `UNDER REVIEW`               |
+| RFC Number   | Title                                                         
                                                                                
                                                                       | Status 
                             |
+| ------------ 
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 ----------------------------------- |
+| 1            | [CSV Source Support for Delta 
Streamer](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+01+%3A+CSV+Source+Support+for+Delta+Streamer)
                                                                         | 
:white_check_mark: `COMPLETED`      |
+| 2            | [ORC Storage in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708439)
                                                                                
                                    | :white_check_mark: `COMPLETED`      |
+| 3            | [Timeline Service with Incremental File System View 
Syncing](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708965)
                                                                             | 
:white_check_mark: `COMPLETED`      |
+| 4            | [Faster Hive incremental pull 
queries](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115513622)
                                                                                
                   | :white_check_mark: `COMPLETED`      |
+| 5            | [HUI (Hudi 
WebUI)](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233)
                                                                                
                                       | :x: `ABANDONED`                     |
+| 6            | [Add indexing support to the log 
file](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+06+%3A+Add+indexing+support+to+the+log+file)
                                                                           | 
:x: `ABANDONED`                     |
+| 7            | [Point in time Time-Travel queries on Hudi 
table](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table)
                                                     | :white_check_mark: 
`COMPLETED`      |
+| 8            | [Metadata based Record Index](./rfc-8/rfc-8.md)               
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 9            | [Hudi Dataset Snapshot 
Exporter](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter)
                                                                                
       | :white_check_mark: `COMPLETED`      |
+| 10           | [Restructuring and auto-generation of 
docs](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs)
                                                                 | 
:white_check_mark: `COMPLETED`      |
+| 11           | [Refactor of the configuration framework of hudi 
project](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+11+%3A+Refactor+of+the+configuration+framework+of+hudi+project)
                                     | :x: `ABANDONED`                     |
+| 12           | [Efficient Migration of Large Parquet Tables to Apache 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi)
                               | :white_check_mark: `COMPLETED`      |
+| 13           | [Integrate Hudi with 
Flink](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520)
                                                                                
                              | :white_check_mark: `COMPLETED`      |
+| 14           | [JDBC incremental 
puller](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller)
                                                                                
                     | :white_check_mark: `COMPLETED`      |
+| 15           | [HUDI File Listing 
Improvements](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)
                                                                                
        | :white_check_mark: `COMPLETED`      |
+| 16           | [Abstraction for HoodieInputFormat and 
RecordReader](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+16+Abstraction+for+HoodieInputFormat+and+RecordReader)
                                                   | :white_check_mark: 
`COMPLETED`      |
+| 17           | [Abstract common meta sync module support multiple meta 
service](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+17+Abstract+common+meta+sync+module+support+multiple+meta+service)
                           | :white_check_mark: `COMPLETED`      |
+| 18           | [Insert Overwrite 
API](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+18+Insert+Overwrite+API)
                                                                                
                               | :white_check_mark: `COMPLETED`      |
+| 19           | [Clustering data for freshness and query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance)
                                                 | :white_check_mark: 
`COMPLETED`      |
+| 20           | [handle failed 
records](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records)
                                                                                
                         | :arrows_counterclockwise: `ONGOING` |
+| 21           | [Allow HoodieRecordKey to be 
Virtual](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual)
                                                                             | 
:white_check_mark: `COMPLETED`      |
+| 22           | [Snapshot Isolation using Optimistic Concurrency Control for 
multi-writers](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers)
 | :white_check_mark: `COMPLETED`      |
+| 23           | [Hudi Observability metrics 
collection](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+23+%3A+Hudi+Observability+metrics+collection)
                                                                         | :x: 
`ABANDONED`                     |
+| 24           | [Hoodie Flink Writer 
Proposal](https://cwiki.apache.org/confluence/display/HUDI/RFC-24%3A+Hoodie+Flink+Writer+Proposal)
                                                                                
              | :white_check_mark: `COMPLETED`      |
+| 25           | [Spark SQL Extension For 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
                                                                                
            | :white_check_mark: `COMPLETED`      |
+| 26           | [Optimization For Hudi Table 
Query](https://cwiki.apache.org/confluence/display/HUDI/RFC-26+Optimization+For+Hudi+Table+Query)
                                                                                
       | :white_check_mark: `COMPLETED`      |
+| 27           | [Data skipping index to improve query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance)
                                                         | :white_check_mark: 
`COMPLETED`      |
+| 28           | [Support Z-order 
curve](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181307144)
                                                                                
                                  | :white_check_mark: `COMPLETED`      |
+| 29           | [Hash 
Index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index) 
                                                                                
                                               | :white_check_mark: `COMPLETED` 
     |
+| 30           | [Batch 
operation](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+30%3A+Batch+operation)
                                                                                
                                      | :x: `ABANDONED`                     |
+| 31           | [Hive integration 
Improvement](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment)
                                                                                
             | :x: `ABANDONED`                     |
+| 32           | [Kafka Connect Sink for 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC-32+Kafka+Connect+Sink+for+Hudi)
                                                                                
                   | :arrows_counterclockwise: `ONGOING` |
+| 33           | [Hudi supports more comprehensive Schema 
Evolution](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution)
                                                    | :white_check_mark: 
`COMPLETED`      |
+| 34           | [Hudi BigQuery Integration](./rfc-34/rfc-34.md)               
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 35           | [Make Flink MOR table writing streaming 
friendly](https://cwiki.apache.org/confluence/display/HUDI/RFC-35%3A+Make+Flink+MOR+table+writing+streaming+friendly)
                                                        | :white_check_mark: 
`COMPLETED`      |
+| 36           | [HUDI Metastore 
Server](https://cwiki.apache.org/confluence/display/HUDI/%5BWIP%5D+RFC-36%3A+HUDI+Metastore+Server)
                                                                                
                  | :arrows_counterclockwise: `ONGOING` |
+| 37           | [Hudi Metadata based Bloom Index](rfc-37/rfc-37.md)           
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 38           | [Spark Datasource V2 Integration](./rfc-38/rfc-38.md)         
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 39           | [Incremental source for Debezium](./rfc-39/rfc-39.md)         
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 40           | [Connector for Trino](./rfc-40/rfc-40.md)                     
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 41           | [Snowflake Integration](./rfc-41/rfc-41.md), supported via 
[Apache XTable (Incubating)](https://xtable.apache.org/)                        
                                                                          | :x: 
`ABANDONED`                     |
+| 42           | [Consistent Hashing Index](./rfc-42/rfc-42.md)                
                                                                                
                                                                       | 
:arrows_counterclockwise: `ONGOING` |
+| 43           | [Table Management Service](./rfc-43/rfc-43.md)                
                                                                                
                                                                       | :x: 
`ABANDONED`                     |
+| 44           | [Hudi Connector for Presto](./rfc-44/rfc-44.md)               
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 45           | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md)          
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 46           | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md)      
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 47           | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md)  
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 48           | [LogCompaction for MOR tables](./rfc-48/rfc-48.md)            
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 49           | [Support sync with DataHub](./rfc-49/rfc-49.md)               
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 50           | [Improve Timeline Server](./rfc-50/rfc-50.md)                 
                                                                                
                                                                       | :x: 
`ABANDONED`                     |
+| 51           | [Change Data Capture](./rfc-51/rfc-51.md)                     
                                                                                
                                                                       | 
:arrows_counterclockwise: `ONGOING` |
+| 52           | [Introduce Secondary Index to Improve HUDI Query 
Performance](./rfc-52/rfc-52.md)                                                
                                                                                
    | :x: `ABANDONED`                     |
+| 53           | [Use Lock-Free Message Queue Improving Hoodie Writing 
Efficiency](./rfc-53/rfc-53.md)                                                 
                                                                               
| :white_check_mark: `COMPLETED`      |
+| 54           | [New Table APIs and Streamline Hudi 
Configs](./rfc-54/rfc-54.md)                                                    
                                                                                
                 | :x: `ABANDONED`                     |
+| 55           | [Improve Hive/Meta sync class design and 
hierarchies](./rfc-55/rfc-55.md)                                                
                                                                                
            | :white_check_mark: `COMPLETED`      |
+| 56           | [Early Conflict Detection For 
Multi-Writer](./rfc-56/rfc-56.md)                                               
                                                                                
                       | :white_check_mark: `COMPLETED`      |
+| 57           | [DeltaStreamer Protobuf Support](./rfc-57/rfc-57.md)          
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 58           | [Integrate column stats index with all query 
engines](./rfc-58/rfc-58.md)                                                    
                                                                                
        | :white_check_mark: `COMPLETED`      |
+| 59           | [Multiple event_time Fields Latest Verification in a Single 
Table](./rfc-59/rfc-59.md)                                                      
                                                                         | 
:eyes: `UNDER REVIEW`               |
+| 60           | [Federated Storage Layer](./rfc-60/rfc-60.md)                 
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 61           | [Snapshot view management](./rfc-61/rfc-61.md)                
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 62           | [Diagnostic Reporter](./rfc-62/rfc-62.md)                     
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 63           | [Expression Indexes](./rfc-63/rfc-63.md)                      
                                                                                
                                                                       | 
:arrows_counterclockwise: `ONGOING` |
+| 64           | [New Hudi Table Spec API for Query 
Integrations](./rfc-64/rfc-64.md)                                               
                                                                                
                  | :eyes: `UNDER REVIEW`               |
+| 65           | [Partition TTL Management](./rfc-65/rfc-65.md)                
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 66           | [Non Blocking Concurrency Control](./rfc-66/rfc-66.md)        
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 67           | [Hudi Bundle Standards](./rfc-67/rfc-67.md)                   
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 68           | [A More Effective HoodieMergeHandler for COW Table with 
Parquet](./rfc-68/rfc-68.md)                                                    
                                                                             | 
:x: `ABANDONED`                     |
+| 69           | [Hudi 1.x](./rfc-69/rfc-69.md)                                
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 70           | [Hudi Reverse Streamer](./rfc/rfc-70/rfc-70.md)               
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 71           | [Enhance OCC conflict detection](./rfc/rfc-71/rfc-71.md)      
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 72           | [Redesign Hudi-Spark Integration](./rfc/rfc-72/rfc-72.md)     
                                                                                
                                                                       | 
:arrows_counterclockwise: `ONGOING` |
+| 73           | [Multi-Table Transactions](./rfc-73/rfc-73.md)                
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 74           | [`HoodieStorage`: Hudi Storage Abstraction and 
APIs](./rfc-74/rfc-74.md)                                                       
                                                                                
      | :arrows_counterclockwise: `ONGOING` |
+| 75           | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)     
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 76           | [Auto Record key generation](./rfc-76/rfc-76.md)              
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 77           | [Secondary Index](./rfc-77/rfc-77.md)                         
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 78           | [1.0 Migration](./rfc-78/rfc-78.md)                           
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 79           | [Robust handling of spark task retries and 
failures](./rfc-79/rfc-79.md)                                                   
                                                                                
          | :x: `ABANDONED`                     |
+| 80           | [Column Groups](./rfc-80/rfc-80.md)                           
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 81           | [Introduce Primary Key Sorted Table](./rfc-81/rfc-81.md)      
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 82           | [Concurrent schema evolution detection](./rfc-82/rfc-82.md)   
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 83           | [Incremental Table Service](./rfc-83/rfc-83.md)               
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 84           | [Optimized SerDe of `DataStream` in Flink 
operators](./rfc-84/rfc-84.md)                                                  
                                                                                
           | :white_check_mark: `COMPLETED`      |
+| 85           | [Hudi Issue and Sprint Management in 
Jira](./rfc-85/rfc-85.md)                                                       
                                                                                
                | :white_check_mark: `COMPLETED`      |
+| 86           | [DataFrame Implementation of HUDI write 
path](./rfc-86/rfc-86.md)                                                       
                                                                                
             | :eyes: `UNDER REVIEW`               |
+| 87           | [Avro elimination for Flink writer](./rfc-87/rfc-87.md)       
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 88           | [New Schema/DataType/Expression 
Abstractions](./rfc-88/rfc-88.md)                                               
                                                                                
                     | :eyes: `UNDER REVIEW`               |
+| 89           | [Dynamic Partition Level Bucket Index](./rfc-89/rfc-89.md)    
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 90           | Add support for cancellable clustering table service plans    
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 91           | [Storage-based lock provider using conditional 
writes](./rfc-91/rfc-91.md)                                                     
                                                                                
      | :white_check_mark: `COMPLETED`      |
+| 92           | Support Bitmap Index                                          
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 93           | [Pluggable Table Formats in Hudi](./rfc-93/rfc-93.md)         
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 94           | Hudi Timeline User Interface (UI)                             
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 95           | [Hudi Flink Source Based on FLIP-27](./rfc-95/rfc-95.md)      
                                                                                
                                                                       | 
:white_check_mark: `COMPLETED`      |
+| 96           | Introduce Unified Bucket Index                                
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 97           | Deprecate Hudi Payload Class Usage                            
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 98           | [Spark Datasource V2 Read](./rfc-98/rfc-98.md)                
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 99           | [Hudi Type System Redesign](./rfc-99/rfc-99.md)               
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 100          | [Unstructured Data Storage in Hudi](./rfc-100/rfc-100.md)     
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 101          | [Updates to the HoodieRecordMerger API](./rfc-101/rfc-101.md) 
                                                                                
                                                                       | 
:hammer_and_wrench: `IN PROGRESS`   |
+| 102          | [Spark Batch Vector Search in Apache 
Hudi](./rfc-102/rfc-102/md)                                                     
                                                                                
                | :white_check_mark: `COMPLETED`      |
+| 103          | Hudi LSM tree layout                                          
                                                                                
                                                                       | :eyes: 
`UNDER REVIEW`               |
+| 104          | [Unify schema evolution on 
schema-on-read](./rfc-104/rfc-104.md)                                           
                                                                                
                          | :eyes: `UNDER REVIEW`               |
+| 105          | [Trino Hudi Connector — Shim/Bundle 
Refactor](./rfc-105/rfc-105.md)                                                 
                                                                                
                 | :eyes: `UNDER REVIEW`               |
+| 106          | [Record Level and Secondary Index Support for Flink 
Writers](./rfc-106/rfc-106.md)                                                  
                                                                                
 | :white_check_mark: `COMPLETED`      |
diff --git a/rfc/rfc-106/index-compaction-flow.png 
b/rfc/rfc-106/index-compaction-flow.png
new file mode 100644
index 000000000000..a81e1d4f1f23
Binary files /dev/null and b/rfc/rfc-106/index-compaction-flow.png differ
diff --git a/rfc/rfc-106/index-write-flow.png b/rfc/rfc-106/index-write-flow.png
new file mode 100644
index 000000000000..6259071a55f6
Binary files /dev/null and b/rfc/rfc-106/index-write-flow.png differ
diff --git a/rfc/rfc-106/rfc-106.md b/rfc/rfc-106/rfc-106.md
new file mode 100644
index 000000000000..d344ccf579f9
--- /dev/null
+++ b/rfc/rfc-106/rfc-106.md
@@ -0,0 +1,260 @@
+   <!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-106: Record Level and Secondary Index Support for Flink Writers
+
+## Proposers
+
+- @danny0405
+
+## Approvers
+ - @geserdugarov
+ - @vinothchandar
+ - @cshuo
+
+## Status
+
+GH Discussion: https://github.com/apache/hudi/discussions/17452
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Apache Hudi provides multiple indexing strategies to efficiently locate 
records during upsert operations. 
+The **Record Level Index (RLI)** is a global index stored in Hudi's **Metadata 
Table (MDT)** that maps each record key to its 
+exact file group location, enabling O(1) lookups. The **Secondary Index (SI)** 
extends this capability to non-record-key, non-unique-key columns. 
+Currently, Spark reads/writes support RLI & SI while Flink does not, creating 
feature disparity between the two engines for Hudi table reads and writes.
+
+This RFC proposes adding RLI and SI support for Flink streaming writes. 
Throughout this document, the term **"index"** refers broadly to both RLI and 
SI; 
+when discussing behavior specific to one type, the terms "RLI" or "SI" will be 
used explicitly.
+
+The goals of this RFC are:
+
+- Provide reliable and performant write support for RLI/SI using Flink APIs
+- Ensure cross-engine compatibility so that Flink can access and utilize 
indexes written by Spark, and vice versa
+- Support global RLI for cross-partition upserts, as well as partition-level 
RLI for large fact tables
+- Enable asynchronous compaction for MDT when indexing is enabled, either 
within the writer pipeline or via background table services
+- Implement smart caching of index data for low-latency access during 
streaming writes
+- Document scale and performance limits for write throughput supported by 
indexing (based on empirical benchmarks)
+- Design the implementation to be extensible for arbitrary secondary indexing 
on different columns
+
+## Background
+
+Apache Hudi uses indexes to determine the location of existing records when 
processing upserts. Without an efficient index, Hudi would 
+need to scan the entire table to find whether a record already exists and 
where it is located. Different index types offer different 
+trade-offs between write performance, read performance, and resource 
consumption.
+
+Currently, Flink Hudi sink does not support RLI or SI, while Hudi Spark 
datasource does and proven at [massive production 
scale](https://hudi.apache.org/blog/2023/11/01/record-level-index/).
+This inconsistency causes friction for users who migrate tables from Spark to 
Flink streaming. When migrating, users must switch 
+the index type from RLI/SI to either `bucket` (a hash-based partitioning 
scheme) or `flink_state` (which uses Flink's state backend to 
+maintain record-to-location mappings). This migration overhead complicates 
production deployments.
+
+Another key motivation is to provide scalable, efficient support for 
**cross-partition updates**—scenarios where a record's partition path changes 
between writes. 
+Currently, the only option for handling cross-partition updates in Flink is 
the `flink_state` index, which maintains a global view of all record locations. 
However, this approach has significant drawbacks: 
+it consumes substantial memory (proportional to the table size) and cannot be 
shared across different workloads or job restarts without state migration.
+
+## High Level Design
+
+The high-level design introduces the following components:
+
+- **MDT-based Index backend**: A new index implementation that can replace the 
current `flink_state` index, storing record-to-location mappings in the MDT 
rather than in Flink's state backend
+- **Index cache with invalidation**: An in-memory cache to accelerate RLI 
lookups, along with a cache invalidation mechanism to maintain consistency with 
the committed state of the table
+- **New Flink Index operator**: A separate Flink operator (`IndexWrite`) 
responsible for writing RLI/SI payloads to the MDT
+- **Synchronous MDT writes**: The MDT's RLI and SI files are written 
synchronously with the data table files within the same commit boundary; the 
metadata is then sent to the coordinator for a final commit to the MDT (after 
the `FILES` partition update is computed)
+- **Asynchronous MDT compaction**: MDT compaction is performed asynchronously, 
reusing the existing data file compaction pipeline to minimize task slot 
consumption
+
+![Index Write Flow](./index-write-flow.png)
+
+### Detailed Design
+
+### The Index Access
+
+In Hudi's Flink integration, the `BucketAssigner` operator is responsible for 
determining where each incoming record should be written. 
+It must identify whether each record is an insert (new record), update 
(existing record), or delete. To make this determination, the operator needs to 
look up whether 
+the record key already exists in the table and, if so, where it is located.
+
+With index support, the `BucketAssigner` operator will use the index metadata 
stored in the MDT as its backend. It will probe the index 
+with incoming record keys to determine the appropriate operation type (insert, 
update, or delete). In this design, the index serves the same role 
+that the `flink_state` index currently serves. Since the existing 
`BucketAssigner` already supports both **global** and **non-global** index 
types, 
+the global RLI will be used for **global** index configurations, while 
partitioned RLI will be used for **non-global** configurations.
+
+To optimize index access patterns and avoid caching all index shards in every 
`BucketAssigner` task, the input records will be shuffled 
+by `hash(record_key) % num_index_shards`. This uses the same hashing algorithm 
as the MDT's index partitioner, ensuring that 
+each `BucketAssigner` task only needs to read from a subset of index shards.
+
+#### Index Cache
+
+Streaming workloads require low-latency processing of each record to achieve 
high throughput. Thus, each record lookup against the index 
+should complete really fast. Reading a RLI entry each time for each record 
will incur 10+ms of latency per record and seriously affect throughput.
+
+**New index mappings cache:** Additionally, a separate memory cache is needed 
for index mappings created during the current checkpoint. 
+These mappings are not yet committed to the Hudi table and are therefore 
invisible to MDT queries. This cache must not be cleared until the 
+corresponding checkpoint/instant is committed to Hudi, which indicates that 
the index payloads have also been committed. This ensures multiple
+records for the same record key (e,g insert to a key, followed by an update 
within the same commit boundary) are routed consistently to same 
+file group, preserving the 1:1 mapping from record key to file group.
+
+The cache stores `key -> location` mappings at the record level, the items are 
evicted by checkpoint level when the checkpoints are committed to Hudi. 
+(Note that the MDT reader also maintains its own native file-level cache.) 
+
+The actual index writes occur in the `IndexWrite` operator and the location 
from the cache will be propagated downstream from the `BucketAssigner` 
operator, where cache lookups and MDT queries to determine record locations.
+The cache is updated for new records and location changes, while the MDT is 
queried only for existing key locations.
+
+The cache update flow is as follows:
+
+1. Probe the cache for the key. If found, update the cache entry if the 
location has changed.
+2. If the key is not in the cache, fall back to querying the MDT. If the key 
exists in the MDT, add it to the cache with its location.
+3. If the key does not exist in the MDT either, add the new key and its 
assigned location to the cache.
+
+#### Index Access Consistency On Fail Cases
+
+Hudi uses a two-phase commit protocol where each Flink checkpoint corresponds 
to a Hudi completed instant. During a checkpoint, Flink completes the data 
writes and collects the Hudi commit metadata. 
+Once the checkpoint acknowledgment event is received, Flink knows the 
checkpoint completed successfully, and the corresponding Hudi instant can be 
committed. 
+However, the acknowledgment message is sent asynchronously on a best-effort 
basis and may be lost in corner cases. A Hudi instant cannot be committed 
without receiving this acknowledgment.
+
+During job restarts or task failover, there are scenarios where a Flink 
checkpoint succeeds but the corresponding Hudi instant remains uncommitted due 
to the two-phase commit mechanism:
+
+1. **Missing acknowledgment**: The acknowledgment message is lost entirely.
+2. **Job restart during commit**: The acknowledgment is received, but the job 
restarts during the instant commit process:
+   1. All write metadata from a checkpoint is collected in the coordinator and 
is ready for committing.
+   2. The acknowledgment message is received.
+   3. The coordinator begins committing the instant with the collected write 
metadata.
+   4. The job restarts or crashes.
+   5. The instant remains uncommitted even though the checkpoint succeeded.
+3. **Task failover before acknowledgment**: A task fails over from a 
checkpoint before its acknowledgment is received:
+   1. All write metadata from checkpoint `ckp_n` is collected in the 
coordinator and is ready for committing.
+   2. The checkpoint succeeds, but the acknowledgment has not been received 
yet.
+   3. A task fails and recovers from `ckp_n`.
+   4. The instant remains uncommitted even though the checkpoint succeeded.
+   5. The acknowledgment message is eventually received.
+   6. During the gap between steps 4 and 5, the `BucketAssigner` must access 
the uncommitted instant because the checkpoint was successful and the data is 
valid.
+4. **Task failover after acknowledgment but before commit completion**: A task 
fails over after the acknowledgment is received but before the instant commit 
completes:
+   1. All write metadata from checkpoint `ckp_n` is collected in the 
coordinator and is ready for committing.
+   2. The acknowledgment message is received.
+   3. The coordinator begins committing the instant with the collected write 
metadata.
+   4. A task fails and recovers from `ckp_n`.
+   5. The instant remains uncommitted even though the checkpoint succeeded.
+   6. Step 3 eventually completes and the instant is committed.
+   7. During the gap between steps 4 and 6, the `BucketAssigner` must access 
the uncommitted instant because the checkpoint was successful and the data is 
valid.
+
+For data table (DT) metadata, the pipeline will recommit the instant using the 
recovered table metadata. However, since the `BucketAssigner` operator is 
upstream of the `StreamWrite` operator, there is a time gap before these 
inflight instants can be recommitted.
+The gap between Flink checkpoint and Hudi instant commit will incur 
consistency issue on index read view, it is fixed by cases:
+
+- on task failover: the coordinator would recommit the pending instants with 
successful Flink checkpoints;
+- on job restart: trigger an explicit job failover from coordinator after the 
recovered pending instant been recommitted.
+
+#### Add Event Time Ordering Value for RLI Payload
+For cross-partition updates or deletes, we can not update the RLI directly 
based on the existing key-location mappings. Currently, the RLI payload only 
has the key to location mappings without actual ordering value.
+we need to merge the data records to see if the incoming record is a valid 
update or delete(for valid, it means greater ordering value), that is the 
behavior for Spark RLI write path, but it is too costly for streaming.
+
+For e.g, for two records `r1:{key: k1, orderingValue: 2, partition: par1}` and 
`r2:{key: k1, orderingValue: 1, partition: par2}`, `r2` comes behind `r1` in a 
different commit, 
+comparison of just key existence is not enough, we need to also compare the 
ordering value to see that `r2` is not a valid update and 
+not sending the retraction record(delete record) into partition `par1` for 
payload delete.
+
+The suggested solution is to store the ordering value into the RLI payload, so 
that we can compare the ordering value when there is a match of exiting key 
lookup, to make the decision whether the incoming record is a valid
+upserts or not.
+
+The query execution follows this order: first access the in-memory cache, then 
query the MDT index:
+
+![The RLI Access Pattern](./rli-access-pattern.png)
+
+### Shuffling Index Records
+
+In the `StreamWrite` operator, index records are derived from incoming data 
records and sent to the `IndexWrite` operator in a streaming 
+fashion. These index records are shuffled by `hash(record_key) % 
num_index_shards`, using the same hashing algorithm as the MDT's 
+index partitioner. This shuffling strategy is critical for avoiding a 
combinatorial explosion of files written to the MDT partition. 
+Without it, the number of files would be `N * M`, where `N` is the number of 
index partition buckets and `M` is the number of data table 
+buckets involved in the current write.
+
+To ensure that each data record and its corresponding index record always 
belong to the same commit/checkpoint, we leverage 
+Flink's barrier alignment mechanism. In Flink, checkpoint barriers flow 
together with records through the pipeline (see 
[how-does-state-snapshotting-work](https://nightlies.apache.org/flink/flink-docs-master/docs/learn-flink/fault_tolerance/#how-does-state-snapshotting-work)).
 
+When the `StreamWrite` operator receives a record, it emits both the data 
record and its corresponding index record within a single `#processElement` 
call. 
+This ensures that the two records are never separated by a checkpoint barrier.
+
+For example:
+
+```text
+// irN means an index record, drN means a data record
+e.g:  [r4 r3 r2 r1 ] => BucketAssignor => [ ir4 dr4 ir3 dr3 ir2 dr2 ir1 dr1]
+```
+
+The barrier propagation algorithm prevents the checkpoint barrier from being 
placed between an index record and its corresponding data record. 
+A placement like `[ ir4 dr4 ir3 dr3 ir2 <checkpoint barrier> dr2 ir1 dr1]` 
cannot occur.
+
+### The Index Write
+
+In the `IndexWrite` operator, index records are buffered and then written to 
the MDT when triggered by a Flink checkpoint. 
+The write status metadata is then sent to the `coordinator`. This metadata 
includes two parts:
+
+- **A**: The written data file paths
+- **B**: The written MDT file paths (specifically those under the index 
partitions)
+
+#### Committing MDT (including Index Partitions)
+
+When committing to the data table, the MDT is committed first with the index 
write metadata (the MDT index partition file handles). 
+The `RLI` and `SI` partition file handles are committed together in the 
`FILES` partition.
+
+During a Flink checkpoint, each index-writing and data-writing task flushes 
all its records to the index and data files respectively. 
+This ensures that the index and data files are always consistent. Both are 
committed together from the Coordinator as a single Hudi commit, 
+following the current commit protocol.
+
+To maintain exactly-once semantics during job recovery, the write status 
metadata must be stored in multiple locations: the `StreamWrite` operator, the 
`IndexWrite` operator, and the `coordinator`. 
+This follows the same pattern as the current approach for maintaining data 
table metadata.
+
+### The Compaction
+
+To minimize task slot consumption, the implementation reuses the existing data 
file compaction sub-pipeline for MDT compaction. 
+This asynchronous compaction is automatically enabled when indexing is active.
+
+![Index Compaction Flow](./index-compaction-flow.png)
+
+## Implementation Plan 
+
+The umbrella issue: [Support record index for Flink 
writer](https://github.com/apache/hudi/issues/17647)
+
+- [Add RLI access cache to support efficient 
lookup](https://github.com/apache/hudi/issues/17697)
+- [Adapter the BucketAssign function with the MDT backed 
index](https://github.com/apache/hudi/issues/17699)
+- [Support mini-batch access to the MDT index for bucket assign 
function](https://github.com/apache/hudi/issues/17842)
+- [Add basic infra to bookeep the mappings between checkpoint id to 
instant](https://github.com/apache/hudi/issues/17700)
+- [Add a new index write function](https://github.com/apache/hudi/issues/17701)
+- [Integrate the MDT compaction with existing compaction sub-pipeline and 
offline job](https://github.com/apache/hudi/issues/17702)
+
+## Rollout/Adoption Plan
+
+ - What impact (if any) will there be on existing users?
+ - No impact because this is a new feature. Existing Flink users can continue 
using their current index types, and 
+   adoption of RLI/SI is optional.
+
+## Test Plan
+
+1. Verify that all write and read scenarios work correctly with indexing 
enabled.
+2. Validate that asynchronous compaction works correctly with mixed DT/MDT 
workloads.
+3. Add new tests to cover job recovery scenarios with uncommitted DT/MDT 
commits.
+4. Conduct benchmarks to determine the upper throughput threshold at which 
indexing is recommended for streaming workloads.
+
+## Appendix
+
+### The Job/Task failover
+
+To maintain exactly-once semantics, the implementation includes infrastructure 
to support job and task recovery:
+
+1. In the `coordinator`, historical uncommitted metadata is persisted in the 
checkpoint state.
+2. In the `StreamWrite` operator, the current write metadata list is persisted 
in the checkpoint state.
+
+When the job is restarted, the following steps are triggered to recover 
uncommitted instants:
+
+1. The `StreamWrite` operator checks for pending instants in the checkpoint 
state and resends an event to the `coordinator` to collect the uncommitted 
metadata.
+2. The `coordinator` recovers its write metadata list from the checkpoint 
state.
+3. The `coordinator` recommits the uncommitted instants using the combined 
metadata from steps 1 and 2.
diff --git a/rfc/rfc-106/rli-access-pattern.png 
b/rfc/rfc-106/rli-access-pattern.png
new file mode 100644
index 000000000000..00888f9c5a4a
Binary files /dev/null and b/rfc/rfc-106/rli-access-pattern.png differ

(hudi) branch master updated: docs: RFC-106 - Record Level and Secondary Index Support for Flink Writers (#17610)

Reply via email to