[jira] [Updated] (HUDI-6788) Integrate FileGroupReader with MergeOnReadInputFormat for Flink

Danny Chen (Jira) Sun, 25 Aug 2024 20:09:15 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danny Chen updated HUDI-6788:
-----------------------------
    Description: 
The existing MergeOnReadInputFormat implements different iterators for all 
kinds of read more: incremental read, read optimized view, snapshot view etc. 
While for better performance and code evolving, we can integrate the new 
FileGroupReader, the main difference is that the FileGroupReader capsulate the 
file slice logs and parquet merging logic, so each iterator can ease the 
redundant work for quering the fs view and comprising the file slices.

We can integrate step by step for different read views: 1. snapshot queries 2. 
read optimized queries 3. skip merge queries

For usability and smoth evolving, we should add a flag for the new reader, the 
old code path should be kept there for 1 or 2 releases.

The major work AIs includes:

1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;

3. Flink implements the snapshot queries using the file group reader;

4. Flink implements the read optimized queries using the file group reader;

5. Flink implements the skip merge queries using the file group reader.

  was:The existing 


> Integrate FileGroupReader with MergeOnReadInputFormat for Flink
> ---------------------------------------------------------------
>
>                 Key: HUDI-6788
>                 URL: https://issues.apache.org/jira/browse/HUDI-6788
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Ethan Guo
>            Assignee: Zhenqiu Huang
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> The existing MergeOnReadInputFormat implements different iterators for all 
> kinds of read more: incremental read, read optimized view, snapshot view etc. 
> While for better performance and code evolving, we can integrate the new 
> FileGroupReader, the main difference is that the FileGroupReader capsulate 
> the file slice logs and parquet merging logic, so each iterator can ease the 
> redundant work for quering the fs view and comprising the file slices.
> We can integrate step by step for different read views: 1. snapshot queries 
> 2. read optimized queries 3. skip merge queries
> For usability and smoth evolving, we should add a flag for the new reader, 
> the old code path should be kept there for 1 or 2 releases.
> The major work AIs includes:
> 1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
> 2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;
> 3. Flink implements the snapshot queries using the file group reader;
> 4. Flink implements the read optimized queries using the file group reader;
> 5. Flink implements the skip merge queries using the file group reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6788) Integrate FileGroupReader with MergeOnReadInputFormat for Flink

Reply via email to