(hudi) branch master updated: [MINOR] Fix RFC index and updating 1.0 RFCs (#12480)

vinoth Thu, 12 Dec 2024 15:12:29 -0800

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/master by this push:
     new 0752f62b5f6 [MINOR] Fix RFC index and updating 1.0 RFCs (#12480)
0752f62b5f6 is described below

commit 0752f62b5f619238e6161fb40dc6b07e0037bb1b
Author: vinoth chandar <[email protected]>
AuthorDate: Thu Dec 12 15:12:10 2024 -0800

    [MINOR] Fix RFC index and updating 1.0 RFCs (#12480)
    
    - Made a pass to fix status of all RFCs
     - Complete RFC-69
     - Redo RFC-77 based on new changes as of final RC
---
 rfc/README.md        | 166 +++++++++++++++++++++++++--------------------------
 rfc/rfc-69/rfc-69.md |   4 +-
 rfc/rfc-77/rfc-77.md | 148 +++++++++++++--------------------------------
 3 files changed, 126 insertions(+), 192 deletions(-)

diff --git a/rfc/README.md b/rfc/README.md
index 9736a84ed0d..56765a620c8 100644
--- a/rfc/README.md
+++ b/rfc/README.md
@@ -34,87 +34,87 @@ The list of all RFCs can be found here.
 
 > Older RFC content is still 
 > [here](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process).
 
-| RFC Number | Title                                                           
                                                                                
                                                                     | Status   
      |
-|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
-| 1          | [CSV Source Support for Delta 
Streamer](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+01+%3A+CSV+Source+Support+for+Delta+Streamer)
                                                                         | 
`COMPLETED`    |
-| 2          | [ORC Storage in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708439)
                                                                                
                                    | `COMPLETED`    |
-| 3          | [Timeline Service with Incremental File System View 
Syncing](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708965)
                                                                             | 
`COMPLETED`    |
-| 4          | [Faster Hive incremental pull 
queries](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115513622)
                                                                                
                   | `COMPLETED`    |
-| 5          | [HUI (Hudi 
WebUI)](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233)
                                                                                
                                       | `ABANDONED`    |
-| 6          | [Add indexing support to the log 
file](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+06+%3A+Add+indexing+support+to+the+log+file)
                                                                           | 
`ABANDONED`    |
-| 7          | [Point in time Time-Travel queries on Hudi 
table](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table)
                                                     | `COMPLETED`    |
-| 8          | [Metadata based Record Index](./rfc-8/rfc-8.md)                 
                                                                                
                                                                     | 
`COMPLETED`    |
-| 9          | [Hudi Dataset Snapshot 
Exporter](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter)
                                                                                
       | `COMPLETED`    |
-| 10         | [Restructuring and auto-generation of 
docs](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs)
                                                                 | `COMPLETED`  
  |
-| 11         | [Refactor of the configuration framework of hudi 
project](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+11+%3A+Refactor+of+the+configuration+framework+of+hudi+project)
                                     | `ABANDONED`    |
-| 12         | [Efficient Migration of Large Parquet Tables to Apache 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi)
                               | `COMPLETED`    |
-| 13         | [Integrate Hudi with 
Flink](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520)
                                                                                
                              | `COMPLETED`    |
-| 14         | [JDBC incremental 
puller](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller)
                                                                                
                     | `COMPLETED`    |
-| 15         | [HUDI File Listing 
Improvements](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)
                                                                                
        | `COMPLETED`    |
-| 16         | [Abstraction for HoodieInputFormat and 
RecordReader](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+16+Abstraction+for+HoodieInputFormat+and+RecordReader)
                                                   | `COMPLETED`    |
-| 17         | [Abstract common meta sync module support multiple meta 
service](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+17+Abstract+common+meta+sync+module+support+multiple+meta+service)
                           | `COMPLETED`    |
-| 18         | [Insert Overwrite 
API](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+18+Insert+Overwrite+API)
                                                                                
                               | `COMPLETED`    |
-| 19         | [Clustering data for freshness and query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance)
                                                 | `COMPLETED`    |
-| 20         | [handle failed 
records](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records)
                                                                                
                         | `ONGOING`      |
-| 21         | [Allow HoodieRecordKey to be 
Virtual](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual)
                                                                             | 
`COMPLETED`    |
+| RFC Number | Title                                                           
                                                                                
                                                                    | Status    
     |
+|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
+| 1          | [CSV Source Support for Delta 
Streamer](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+01+%3A+CSV+Source+Support+for+Delta+Streamer)
                                                                        | 
`COMPLETED`    |
+| 2          | [ORC Storage in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708439)
                                                                                
                                   | `COMPLETED`    |
+| 3          | [Timeline Service with Incremental File System View 
Syncing](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708965)
                                                                            | 
`COMPLETED`    |
+| 4          | [Faster Hive incremental pull 
queries](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115513622)
                                                                                
                  | `COMPLETED`    |
+| 5          | [HUI (Hudi 
WebUI)](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233)
                                                                                
                                      | `ABANDONED`    |
+| 6          | [Add indexing support to the log 
file](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+06+%3A+Add+indexing+support+to+the+log+file)
                                                                          | 
`ABANDONED`    |
+| 7          | [Point in time Time-Travel queries on Hudi 
table](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table)
                                                    | `COMPLETED`    |
+| 8          | [Metadata based Record Index](./rfc-8/rfc-8.md)                 
                                                                                
                                                                    | 
`COMPLETED`    |
+| 9          | [Hudi Dataset Snapshot 
Exporter](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter)
                                                                                
      | `COMPLETED`    |
+| 10         | [Restructuring and auto-generation of 
docs](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs)
                                                                | `COMPLETED`   
 |
+| 11         | [Refactor of the configuration framework of hudi 
project](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+11+%3A+Refactor+of+the+configuration+framework+of+hudi+project)
                                    | `ABANDONED`    |
+| 12         | [Efficient Migration of Large Parquet Tables to Apache 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi)
                              | `COMPLETED`    |
+| 13         | [Integrate Hudi with 
Flink](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520)
                                                                                
                             | `COMPLETED`    |
+| 14         | [JDBC incremental 
puller](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller)
                                                                                
                    | `COMPLETED`    |
+| 15         | [HUDI File Listing 
Improvements](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)
                                                                                
       | `COMPLETED`    |
+| 16         | [Abstraction for HoodieInputFormat and 
RecordReader](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+16+Abstraction+for+HoodieInputFormat+and+RecordReader)
                                                  | `COMPLETED`    |
+| 17         | [Abstract common meta sync module support multiple meta 
service](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+17+Abstract+common+meta+sync+module+support+multiple+meta+service)
                          | `COMPLETED`    |
+| 18         | [Insert Overwrite 
API](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+18+Insert+Overwrite+API)
                                                                                
                              | `COMPLETED`    |
+| 19         | [Clustering data for freshness and query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance)
                                                | `COMPLETED`    |
+| 20         | [handle failed 
records](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records)
                                                                                
                        | `ONGOING`      |
+| 21         | [Allow HoodieRecordKey to be 
Virtual](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual)
                                                                            | 
`COMPLETED`    |
 | 22         | [Snapshot Isolation using Optimistic Concurrency Control for 
multi-writers](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers)
 | `COMPLETED`    |
-| 23         | [Hudi Observability metrics 
collection](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+23+%3A+Hudi+Observability+metrics+collection)
                                                                         | 
`ABANDONED`    | 
-| 24         | [Hoodie Flink Writer 
Proposal](https://cwiki.apache.org/confluence/display/HUDI/RFC-24%3A+Hoodie+Flink+Writer+Proposal)
                                                                                
              | `COMPLETED`    | 
-| 25         | [Spark SQL Extension For 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
                                                                                
            | `COMPLETED`    | 
-| 26         | [Optimization For Hudi Table 
Query](https://cwiki.apache.org/confluence/display/HUDI/RFC-26+Optimization+For+Hudi+Table+Query)
                                                                                
       | `COMPLETED`    | 
-| 27         | [Data skipping index to improve query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance)
                                                         | `COMPLETED`    | 
-| 28         | [Support Z-order 
curve](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181307144)
                                                                                
                                  | `COMPLETED`    |
-| 29         | [Hash 
Index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index) 
                                                                                
                                               | `COMPLETED`    | 
-| 30         | [Batch 
operation](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+30%3A+Batch+operation)
                                                                                
                                      | `ABANDONED`    | 
-| 31         | [Hive integration 
Improvement](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment)
                                                                                
             | `ONGOING`      | 
-| 32         | [Kafka Connect Sink for 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC-32+Kafka+Connect+Sink+for+Hudi)
                                                                                
                   | `ONGOING`      | 
-| 33         | [Hudi supports more comprehensive Schema 
Evolution](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution)
                                                    | `COMPLETED`    | 
-| 34         | [Hudi BigQuery Integration](./rfc-34/rfc-34.md)                 
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 35         | [Make Flink MOR table writing streaming 
friendly](https://cwiki.apache.org/confluence/display/HUDI/RFC-35%3A+Make+Flink+MOR+table+writing+streaming+friendly)
                                                        | `UNDER REVIEW` | 
-| 36         | [HUDI Metastore 
Server](https://cwiki.apache.org/confluence/display/HUDI/%5BWIP%5D+RFC-36%3A+HUDI+Metastore+Server)
                                                                                
                  | `ONGOING`      | 
-| 37         | [Hudi Metadata based Bloom Index](rfc-37/rfc-37.md)             
                                                                                
                                                                     | 
`ONGOING`      | 
-| 38         | [Spark Datasource V2 Integration](./rfc-38/rfc-38.md)           
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 39         | [Incremental source for Debezium](./rfc-39/rfc-39.md)           
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 40         | [Hudi Connector for Trino](./rfc-40/rfc-40.md)                  
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 41         | [Hudi Snowflake Integration](./rfc-41/rfc-41.md)                
                                                                                
                                                                     | `IN 
PROGRESS`  | 
-| 42         | [Consistent Hashing Index](./rfc-42/rfc-42.md)                  
                                                                                
                                                                     | 
`ONGOING`      | 
-| 43         | [Table Management Service](./rfc-43/rfc-43.md)                  
                                                                                
                                                                     | `IN 
PROGRESS`  | 
-| 44         | [Hudi Connector for Presto](./rfc-44/rfc-44.md)                 
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 45         | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md)            
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 46         | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md)        
                                                                                
                                                                     | 
`ONGOING`      | 
-| 47         | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md)    
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 48         | [LogCompaction for MOR tables](./rfc-48/rfc-48.md)              
                                                                                
                                                                     | 
`ONGOING`      | 
-| 49         | [Support sync with DataHub](./rfc-49/rfc-49.md)                 
                                                                                
                                                                     | 
`COMPLETED`    |
-| 50         | [Improve Timeline Server](./rfc-50/rfc-50.md)                   
                                                                                
                                                                     | `IN 
PROGRESS`  | 
-| 51         | [Change Data Capture](./rfc-51/rfc-51.md)                       
                                                                                
                                                                     | 
`ONGOING`      |
-| 52         | [Introduce Secondary Index to Improve HUDI Query 
Performance](./rfc-52/rfc-52.md)                                                
                                                                                
    | `ONGOING`      |
-| 53         | [Use Lock-Free Message Queue Improving Hoodie Writing 
Efficiency](./rfc-53/rfc-53.md)                                                 
                                                                               
| `COMPLETED`    | 
-| 54         | [New Table APIs and Streamline Hudi 
Configs](./rfc-54/rfc-54.md)                                                    
                                                                                
                 | `UNDER REVIEW` | 
-| 55         | [Improve Hive/Meta sync class design and 
hierarchies](./rfc-55/rfc-55.md)                                                
                                                                                
            | `COMPLETED`    | 
-| 56         | [Early Conflict Detection For Multi-Writer](./rfc-56/rfc-56.md) 
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 57         | [DeltaStreamer Protobuf Support](./rfc-57/rfc-57.md)            
                                                                                
                                                                     | 
`COMPLETED`    | 
-| 58         | [Integrate column stats index with all query 
engines](./rfc-58/rfc-58.md)                                                    
                                                                                
        | `UNDER REVIEW` |
-| 59         | [Multiple event_time Fields Latest Verification in a Single 
Table](./rfc-59/rfc-59.md)                                                      
                                                                         | 
`UNDER REVIEW` |
-| 60         | [Federated Storage Layer](./rfc-60/rfc-60.md)                   
                                                                                
                                                                     | `IN 
PROGRESS`  |
-| 61         | [Snapshot view management](./rfc-61/rfc-61.md)                  
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 62         | [Diagnostic Reporter](./rfc-62/rfc-62.md)                       
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 63         | [Expression Indexes](./rfc-63/rfc-63.md)                        
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 64         | [New Hudi Table Spec API for Query 
Integrations](./rfc-64/rfc-64.md)                                               
                                                                                
                  | `UNDER REVIEW` |
-| 65         | [Partition TTL Management](./rfc-65/rfc-65.md)                  
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 66         | [Lockless Multi-Writer Support](./rfc-66/rfc-66.md)             
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 67         | [Hudi Bundle Standards](./rfc-67/rfc-67.md)                     
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 68         | [A More Effective HoodieMergeHandler for COW Table with 
Parquet](./rfc-68/rfc-68.md)                                                    
                                                                             | 
`UNDER REVIEW` |
-| 69         | [Hudi 1.x](./rfc-69/rfc-69.md)                                  
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 70         | [Hudi Reverse Streamer](./rfc/rfc-70/rfc-70.md)                 
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 71         | [Enhance OCC conflict detection](./rfc/rfc-71/rfc-71.md)        
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 72         | [Redesign Hudi-Spark Integration](./rfc/rfc-72/rfc-72.md)       
                                                                                
                                                                     | 
`ONGOING`      |
-| 73         | [Multi-Table Transactions](./rfc-73/rfc-73.md)                  
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 74         | [`HoodieStorage`: Hudi Storage Abstraction and 
APIs](./rfc-74/rfc-74.md)                                                       
                                                                                
      | `UNDER REVIEW` |
-| 75         | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)       
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 76         | [Auto Record key generation](./rfc-76/rfc-76.md)                
                                                                                
                                                                     | `IN 
PROGRESS`  |
-| 77         | [Secondary Index](./rfc-77/rfc-77.md)                           
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 78         | [1.0 Migration](./rfc-78/rfc-78.md)                             
                                                                                
                                                                     | `IN 
PROGRESS`  |
-| 79         | [Robust handling of spark task retries and 
failures](./rfc-79/rfc-79.md)                                                   
                                                                                
          | `IN PROGRESS`  |
-| 80         | [Column Families](./rfc-80/rfc-80.md)                           
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 81         | [Log Compaction with Merge Sort](./rfc-81/rfc-81.md)            
                                                                                
                                                                     | `UNDER 
REVIEW` |
-| 82         | [Concurrent schema evolution detection](./rfc-82/rfc-82.md)     
                                                                                
                                                                     | `UNDER 
REVIEW` |
+| 23         | [Hudi Observability metrics 
collection](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+23+%3A+Hudi+Observability+metrics+collection)
                                                                        | 
`ABANDONED`    | 
+| 24         | [Hoodie Flink Writer 
Proposal](https://cwiki.apache.org/confluence/display/HUDI/RFC-24%3A+Hoodie+Flink+Writer+Proposal)
                                                                                
             | `COMPLETED`    | 
+| 25         | [Spark SQL Extension For 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
                                                                                
           | `COMPLETED`    | 
+| 26         | [Optimization For Hudi Table 
Query](https://cwiki.apache.org/confluence/display/HUDI/RFC-26+Optimization+For+Hudi+Table+Query)
                                                                                
      | `COMPLETED`    | 
+| 27         | [Data skipping index to improve query 
performance](https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance)
                                                        | `COMPLETED`    | 
+| 28         | [Support Z-order 
curve](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181307144)
                                                                                
                                 | `COMPLETED`    |
+| 29         | [Hash 
Index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index) 
                                                                                
                                              | `COMPLETED`    | 
+| 30         | [Batch 
operation](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+30%3A+Batch+operation)
                                                                                
                                     | `ABANDONED`    | 
+| 31         | [Hive integration 
Improvement](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment)
                                                                                
            | `ONGOING`      | 
+| 32         | [Kafka Connect Sink for 
Hudi](https://cwiki.apache.org/confluence/display/HUDI/RFC-32+Kafka+Connect+Sink+for+Hudi)
                                                                                
                  | `ONGOING`      | 
+| 33         | [Hudi supports more comprehensive Schema 
Evolution](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution)
                                                   | `COMPLETED`    | 
+| 34         | [Hudi BigQuery Integration](./rfc-34/rfc-34.md)                 
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 35         | [Make Flink MOR table writing streaming 
friendly](https://cwiki.apache.org/confluence/display/HUDI/RFC-35%3A+Make+Flink+MOR+table+writing+streaming+friendly)
                                                       | `COMPLETED`    | 
+| 36         | [HUDI Metastore 
Server](https://cwiki.apache.org/confluence/display/HUDI/%5BWIP%5D+RFC-36%3A+HUDI+Metastore+Server)
                                                                                
                 | `ONGOING`      | 
+| 37         | [Hudi Metadata based Bloom Index](rfc-37/rfc-37.md)             
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 38         | [Spark Datasource V2 Integration](./rfc-38/rfc-38.md)           
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 39         | [Incremental source for Debezium](./rfc-39/rfc-39.md)           
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 40         | [Connector for Trino](./rfc-40/rfc-40.md)                       
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 41         | [Snowflake Integration](./rfc-41/rfc-41.md), supported via 
[Apache XTable (Incubating)](https://xtable.apache.org/)                        
                                                                         | 
`ABANDONED`    | 
+| 42         | [Consistent Hashing Index](./rfc-42/rfc-42.md)                  
                                                                                
                                                                    | `ONGOING` 
     | 
+| 43         | [Table Management Service](./rfc-43/rfc-43.md)                  
                                                                                
                                                                    | `ONGOING` 
     | 
+| 44         | [Hudi Connector for Presto](./rfc-44/rfc-44.md)                 
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 45         | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md)            
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 46         | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md)        
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 47         | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md)    
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 48         | [LogCompaction for MOR tables](./rfc-48/rfc-48.md)              
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 49         | [Support sync with DataHub](./rfc-49/rfc-49.md)                 
                                                                                
                                                                    | 
`COMPLETED`    |
+| 50         | [Improve Timeline Server](./rfc-50/rfc-50.md)                   
                                                                                
                                                                    | `IN 
PROGRESS`  | 
+| 51         | [Change Data Capture](./rfc-51/rfc-51.md)                       
                                                                                
                                                                    | `ONGOING` 
     |
+| 52         | [Introduce Secondary Index to Improve HUDI Query 
Performance](./rfc-52/rfc-52.md)                                                
                                                                                
   | `ABANDONED`    |
+| 53         | [Use Lock-Free Message Queue Improving Hoodie Writing 
Efficiency](./rfc-53/rfc-53.md)                                                 
                                                                              | 
`COMPLETED`    | 
+| 54         | [New Table APIs and Streamline Hudi 
Configs](./rfc-54/rfc-54.md)                                                    
                                                                                
                | `UNDER REVIEW` | 
+| 55         | [Improve Hive/Meta sync class design and 
hierarchies](./rfc-55/rfc-55.md)                                                
                                                                                
           | `COMPLETED`    | 
+| 56         | [Early Conflict Detection For Multi-Writer](./rfc-56/rfc-56.md) 
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 57         | [DeltaStreamer Protobuf Support](./rfc-57/rfc-57.md)            
                                                                                
                                                                    | 
`COMPLETED`    | 
+| 58         | [Integrate column stats index with all query 
engines](./rfc-58/rfc-58.md)                                                    
                                                                                
       | `UNDER REVIEW` |
+| 59         | [Multiple event_time Fields Latest Verification in a Single 
Table](./rfc-59/rfc-59.md)                                                      
                                                                        | 
`UNDER REVIEW` |
+| 60         | [Federated Storage Layer](./rfc-60/rfc-60.md)                   
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 61         | [Snapshot view management](./rfc-61/rfc-61.md)                  
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 62         | [Diagnostic Reporter](./rfc-62/rfc-62.md)                       
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 63         | [Expression Indexes](./rfc-63/rfc-63.md)                        
                                                                                
                                                                    | `ONGOING` 
     |
+| 64         | [New Hudi Table Spec API for Query 
Integrations](./rfc-64/rfc-64.md)                                               
                                                                                
                 | `UNDER REVIEW` |
+| 65         | [Partition TTL Management](./rfc-65/rfc-65.md)                  
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 66         | [Non Blocking Concurrency Control](./rfc-66/rfc-66.md)          
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 67         | [Hudi Bundle Standards](./rfc-67/rfc-67.md)                     
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 68         | [A More Effective HoodieMergeHandler for COW Table with 
Parquet](./rfc-68/rfc-68.md)                                                    
                                                                            | 
`UNDER REVIEW` |
+| 69         | [Hudi 1.x](./rfc-69/rfc-69.md)                                  
                                                                                
                                                                    | 
`COMPLETED`    |
+| 70         | [Hudi Reverse Streamer](./rfc/rfc-70/rfc-70.md)                 
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 71         | [Enhance OCC conflict detection](./rfc/rfc-71/rfc-71.md)        
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 72         | [Redesign Hudi-Spark Integration](./rfc/rfc-72/rfc-72.md)       
                                                                                
                                                                    | `ONGOING` 
     |
+| 73         | [Multi-Table Transactions](./rfc-73/rfc-73.md)                  
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 74         | [`HoodieStorage`: Hudi Storage Abstraction and 
APIs](./rfc-74/rfc-74.md)                                                       
                                                                                
     | `ONGOING`      |
+| 75         | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)       
                                                                                
                                                                    | `IN 
PROGRESS`  |
+| 76         | [Auto Record key generation](./rfc-76/rfc-76.md)                
                                                                                
                                                                    | `IN 
PROGRESS`  |
+| 77         | [Secondary Index](./rfc-77/rfc-77.md)                           
                                                                                
                                                                    | `ONGOING` 
     |
+| 78         | [1.0 Migration](./rfc-78/rfc-78.md)                             
                                                                                
                                                                    | `IN 
PROGRESS`  |
+| 79         | [Robust handling of spark task retries and 
failures](./rfc-79/rfc-79.md)                                                   
                                                                                
         | `IN PROGRESS`  |
+| 80         | [Column Families](./rfc-80/rfc-80.md)                           
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 81         | [Log Compaction with Merge Sort](./rfc-81/rfc-81.md)            
                                                                                
                                                                    | `UNDER 
REVIEW` |
+| 82         | [Concurrent schema evolution detection](./rfc-82/rfc-82.md)     
                                                                                
                                                                    | `UNDER 
REVIEW` |
diff --git a/rfc/rfc-69/rfc-69.md b/rfc/rfc-69/rfc-69.md
index 7e2820fa1b4..4ca3cb95546 100644
--- a/rfc/rfc-69/rfc-69.md
+++ b/rfc/rfc-69/rfc-69.md
@@ -26,7 +26,7 @@
 
 ## Status
 
-Under Review
+Completed
 
 ## Abstract
 
@@ -164,7 +164,7 @@ JIRA Issues Filter for 1.0: 
[link](https://issues.apache.org/jira/issues/?filter
 | Presto/Trino queries                      | Change Presto/Trino connectors 
to work with new format, integrate fully with metadata                          
                                     |                | 
[HUDI-3210](https://issues.apache.org/jira/browse/HUDI-4394), 
[HUDI-4394](https://issues.apache.org/jira/browse/HUDI-4394),[HUDI-4552](https://issues.apache.org/jira/browse/HUDI-4552)
 |
 
 
-## Follow-on/1.1 Release
+## Follow-on releases
 
 The RFC feedback process has generated some awesome new ideas, and we propose 
to have the following be taken up post 1.0 release, 
 for easy sequencing of these projects. However, contributors can feel free to 
drive these JIRAs/designs as they see fit.
diff --git a/rfc/rfc-77/rfc-77.md b/rfc/rfc-77/rfc-77.md
index dd488033ecb..1b69e8c8ec4 100644
--- a/rfc/rfc-77/rfc-77.md
+++ b/rfc/rfc-77/rfc-77.md
@@ -19,7 +19,6 @@
 
 ## Proposers
 
-- @bhat-vinay
 - @codope
 
 ## Approvers
@@ -36,7 +35,7 @@ JIRA: https://issues.apache.org/jira/browse/HUDI-7146
 
 In this RFC, we propose implementing Secondary Indexes (SI), a new capability 
in Hudi's metadata table (MDT) based indexing 
 system.  SI are indexes defined on user specified columns of the table. 
Similar to record level indexes,
-SI will improve query performance when the query predicate contains secondary 
keys. The number of files
+SI will improve query performance when the query predicate involves those 
secondary columns. The number of files
 that a query needs to scan can be pruned down using secondary indexes.
 
 ## Background
@@ -60,6 +59,7 @@ not in the scope of this RFC.
 ## Design and Implementation
 This section discusses briefly the goals, design, implementation details of 
supporting SI in Hudi. At a high level,
 the design principle and goals are as follows:
+
 1. User specifies SI to be built on a given column of a table. A given SI can 
be built on only one column of the table
 (i.e composite keys are not allowed). Any number of SI can be built on a Hudi 
table. The indexes to be built are 
 specified using regular SQL statements.
@@ -72,9 +72,10 @@ indexes.
 SI can be created using the regular `CREATE INDEX` SQL statement.
 ```
 -- PROPOSED SYNTAX WITH `secondary_index` as the index type --
-CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING 
secondary_index](index_column)
+CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING 
index_type](index_column)
 -- Examples --
-CREATE INDEX idx_city on hudi_table USING secondary_index(city)
+CREATE INDEX idx_city on hudi_table USING bloom_filters(city)
+-- Default is to create a hash based record level index mapping secondary 
column to RLI entries.
 CREATE INDEX idx_last_name on hudi_table (last_name)
 
 -- NO CHANGE IN DROP INDEX --
@@ -88,7 +89,7 @@ in MDT by prefixing `secondary_index_`. If the `index_name` 
is `idx_city`, then
 The index_type will be `secondary_index`. This will be used to distinguish SI 
from other Functional Indexes.
 
 ### Secondary Index Metadata
-Secondary index metadata will be managed the same way as Functional Index 
metadata. Since SI will not have any function
+Secondary index metadata will be managed the same way as Expression Index 
metadata. If the SI does not have any function
 to be applied on each row, the `function_name` will be NULL.
 
 ### Index in Metadata Table (MDT)
@@ -97,7 +98,7 @@ prefixing `secondary_index_`. Each entry in the SI partition 
will be a mapping o
 `secondary_key -> record_key`. `secondary_key` will form the "record key" for 
the record of the SI partition. Note that
 an important design consideration here is that users may choose to build SI on 
a non-unique column of the table.
 
-#### Index Initialisation
+#### Index Initialization
 Initial build of the secondary index will scan all file slices (of the base 
table) to extract 
 `secondary-key -> record-key` tuple and write it into the secondary index 
partition in the metadata table. 
 This is similar to how RLI is initialised.
@@ -107,10 +108,9 @@ The index needs to be updated on inserts, updates and 
deletes to the base table.
 the base table could be non-unique, this process differs significantly 
compared to RLI.
 
 ##### Inserts (on the base table)
-Newly inserted row's record-key and secondary-key is required to build the 
secondary-index entry. The record key is 
-already stored in the `WriteStatus` and commit metadata has the files touched 
by that commit. `WriteStatus` will be enhanced to store the secondary-key 
values (for all
-those columns on which secondary index is defined). The metadata writer will 
extract this information and write it out 
-to the secondary index partition. [1]
+Newly inserted row's record-key and secondary-key is required to build the 
secondary-index entry. The commit metadata has the files affected by that 
commit. 
+The metadata writer will extract the newly written records based on the commit 
metadata and generate the secondary-key values (for all
+those columns on which secondary index is defined) to the secondary index 
partition. [1]
 
 ##### Updates (on the base table)
 Similar to inserts, the `secondary-key -> record-key` tuples are extracted 
from the WriteStatus. However, additional 
@@ -131,7 +131,7 @@ Another key observation here is that `old-secondary-key` is 
required to construc
 data systems, Hudi does not read the old-image of a row on updates until a 
merge is executed. It detects that a row is getting updated by simply 
 reading the index and appending the updates in log files. Hence, there needs 
to be a mechanism to extract `old-secondary-key`. We propose
 `old-secondary-key` to be extracted by scanning the MDT partition (hosting the 
SI) and doing a reverse lookup based 
-on the `record-key` of the row being updated. It should be noted that this is 
going to be expensive operation as the 
+on the `record-key` of the row being updated. It should be noted that this 
might be an expensive operation as the 
 base table grows in size (which inherently means that SI will grow in size) in 
terms of number of rows. One way to 
 optimize this is to build a reverse mapping `record-key -> secondary-key` in a 
different MDT partition. This is 
 left as a TBD (as of this writing).
@@ -153,8 +153,8 @@ records is used to identify candidate records that need to 
be merged. The 'key'
 `record-key` and by definition it is unique. But, the keys for secondary index 
entries are the `secondary-keys` which 
 can be non-unique. Hence, the merging of SI entries will make use of the  
payload i.e `record-key` in the
 `secondary-key -> record-key` tuple to identify candidate records that need to 
be merged. It will also be guided by the 
-tombstone record emitted during update or deletes. An example is provided here 
on how the different log files are merged and how the merged log 
-records are finally merged with the base file to obtain the merged records (of 
the MDT partition hosting SI).
+tombstone record emitted during update or deletes. An example is provided here 
on how the different log files are merged 
+and how the merged log records are finally merged with the base file to obtain 
the merged records (of the MDT partition hosting SI).
 
 Consider the following table, `trips_table`. Note that this table is only used 
to illustrate the merging logic and not 
 to be used as a definitive table for other considertaion (for example, the 
performance aspect of some of the algorithm 
@@ -215,98 +215,11 @@ on `secondary-key` and second search will be based on 
`record-key`. Hence, a sin
 uses a flat array will not be efficient. 
 4. Should allow for efficient insertion of records (for inserting merged 
record and for buffering fresh records). 
 
-The [initial POC](https://github.com/apache/hudi/pull/10625) makes use of an 
in-memory nested maps - with the first level keyed by `secondary-key`  
-and the second level keyed by`record-key`. However, the final design should 
allow spilling to disk.
-
-Considering the above requirements, the proposal is to introduce a new class 
hierarchy for handling merge keys in a more
-flexible and decoupled manner. It adds the `HoodieMergeKey` interface, along 
with two
-implementations: `HoodieSimpleMergeKey` and `HoodieCompositeMergeKey`.
-```java
-public interface HoodieMergeKey extends Serializable {
-
-   /**
-    * Get the partition path.
-    */
-   String getPartitionPath();
-
-
-   /**
-    * Get the record key.
-    */
-   Serializable getRecordKey();
-
-   /**
-    * Get the hoodie key.
-    * For simple merge keys, this is used to directly fetch the HoodieKey, 
which is a combination of record key and partition path.
-    */
-   default HoodieKey getHoodieKey() {
-      return new HoodieKey(getRecordKey().toString(), getPartitionPath());
-   }
-}
-```
-
-`HoodieSimpleMergeKey` simply wraps `HoodieKey` for existing scenarios where 
the key is a
-string. `HoodieCompositeMergeKey` allows for complex types as keys, enhancing 
flexibility for scenarios where a simple
-string key is not sufficient.
-
-```java
-public class HoodieSimpleMergeKey implements HoodieMergeKey {
-
-  private final HoodieKey simpleKey;
-
-  public HoodieSimpleMergeKey(HoodieKey simpleKey) {
-    this.simpleKey = simpleKey;
-  }
-
-  @Override
-  public String getPartitionPath() {
-    return simpleKey.getPartitionPath();
-  }
-
-  @Override
-  public Serializable getRecordKey() {
-    return simpleKey.getRecordKey();
-  }
-
-  public HoodieKey getHoodieKey() {
-    return simpleKey;
-  }
-}
-
-public class HoodieCompositeMergeKey<K extends Serializable> implements 
HoodieMergeKey {
-
-  private final K compositeKey;
-  private final String partitionPath;
-
-  public HoodieCompositeMergeKey(K compositeKey, String partitionPath) {
-    this.compositeKey = compositeKey;
-    this.partitionPath = partitionPath;
-  }
-
-  @Override
-  public String getPartitionPath() {
-    return partitionPath;
-  }
-
-  @Override
-  public Serializable getRecordKey() {
-    return compositeKey;
-  }
-}
-```
-
-We also introduce a new `HoodieRecordMerger` implementation based on 
`HoodieMergeKey`. For other keys, it falls back to
-merge method of parent class. The new record merger will be used in 
`HoodieMergedLogRecordScanner` to merge records from
-MDT partition hosting SI.
-
-The primary advantage of this approach is that we do not leak any details to 
the upper layers such as merge handle.
-However, `HoodieMetadataLogRecordReader` should create the 
`HoodieMergedLogRecordScanner` with the
-correct `HoodieRecordMerger` implementation, instead of
-the [current record 
merger](https://github.com/apache/hudi/blob/cb6eb6785fdeb88e66016a2b8c0c6e6fa184b309/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataLogRecordReader.java#L156).
-
-These changes do not affect existing functionalities that do not rely on merge 
keys. It introduces additional classes
-that are used explicitly for new functionalities involving various key types 
in merging operations. This ensures minimal
-to no risk for existing processes.
+This is achieved by the following efficient encoding of the reverse mapping of 
secondary values to record keys in the 
+MDT partition. We exploit a key observation that its enough to merge SI 
entries and tombstones with a tuple key 
+`{secondaryKey, recordKey}` through the existing spillable/merge map 
implementation. We store a flattened version of the 
+logical multimap as a key-value format with key : `secondaryKey+recordKey` and 
value `isDeleted: false|true` indicating
+whether this is a tombstone or an insert of the SI entry.
 
 ### Comparing alternate design proposals
 
@@ -315,8 +228,7 @@ Here are some alternate options that we considered:
 1. Extend Hudi's `ExternalSpillableMap` to support multi-map. More signicant 
refactoring is required and it would have
    leaked implementation details to the write handle layer, as the records 
held by `ExternalSpillableMap` is exposed to
    write handle via `HoodieMeredLogRecordScanner::getRecords`.
-2. Write spillable version
-   of [Guava's 
multi-map](https://github.com/google/guava/wiki/NewCollectionTypesExplained#multimap).
 Apart from reason
+2. Write spillable version of [Guava's 
multi-map](https://github.com/google/guava/wiki/NewCollectionTypesExplained#multimap).
 Apart from reason
    mentioned above, we did not want to add a third-party dependency on Guava.
 3. Use [Chronicle map](https://github.com/OpenHFT/Chronicle-Map). Same reasons 
as above.
 4. Use two different spillable data structures - one is a set of 
`secondary-key` and the other is map of
@@ -332,6 +244,28 @@ result set.
 3. Indexing strategy should be accompanied by performance test results showing 
its benefits on the query path (and optionally 
 overhead on the index maintenance (write) path)
 
+### SparkSQL Benchmark
+
+Benchmarks of the implementation shows some impressive gains. 
+
+Table used - web\_sales (from 10 TB tpc-ds dataset)
+Total File groups - 286,603
+Total Records - 7,198,162,544
+Cardinality of Secondary index column ~ 1:150 (not too high or not too low)
+
+| Run | Query latency w/o data skipping (secs) | Query latency w/ Data 
Skipping (leveraging SI) (secs) | Improvement |
+| ---| ---| ---| --- |
+| 1 | 252 | 31 | ~88% |
+| 2 | 214 | 10 | ~95% |
+| 3 | 204 | 9 | ~95% |
+
+|  | Stats w/o data skipping | Stats w/ Data Skipping (leveraging SI) |
+| ---| ---| --- |
+| Number of Files read | 286,603 | 150 |
+| size of files read | 759 GB | 593 MB |
+| Total Number of Rows read | 7,198,162,544 | 5,811,187 |
+
+
 ## Future enhancements and roadmap
 
 The feature can evolve to provide additional functionalities.

(hudi) branch master updated: [MINOR] Fix RFC index and updating 1.0 RFCs (#12480)

Reply via email to