[ 
https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6788:
----------------------------
    Sprint: Hudi 1.0 Sprint 2024/08/26-9/1, Hudi 1.0 Sprint 2024/08/26-9/2  
(was: Hudi 1.0 Sprint 2024/08/26-9/1)

> Integrate FileGroupReader with MergeOnReadInputFormat for Flink
> ---------------------------------------------------------------
>
>                 Key: HUDI-6788
>                 URL: https://issues.apache.org/jira/browse/HUDI-6788
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Ethan Guo
>            Assignee: Zhenqiu Huang
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> The existing MergeOnReadInputFormat implements different iterators for all 
> kinds of read more: incremental read, read optimized view, snapshot view etc. 
> While for better performance and code evolving, we can integrate the new 
> FileGroupReader, the main difference is that the FileGroupReader capsulate 
> the file slice logs and parquet merging logic, so each iterator can ease the 
> redundant work for quering the fs view and comprising the file slices.
> We can integrate step by step for different read views: 1. snapshot queries 
> 2. read optimized queries 3. skip merge queries
> For usability and smoth evolving, we should add a flag for the new reader, 
> the old code path should be kept there for 1 or 2 releases.
> The major work AIs includes:
> 1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
> 2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;
> 3. Flink implements the snapshot queries using the file group reader;
> 4. Flink implements the read optimized queries using the file group reader;
> 5. Flink implements the skip merge queries using the file group reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to