[
https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-6788:
----------------------------
Sprint: Hudi 1.0 Sprint 2024/08/26-9/1, Hudi 1.0 Sprint 2024/08/26-9/2
(was: Hudi 1.0 Sprint 2024/08/26-9/1)
> Integrate FileGroupReader with MergeOnReadInputFormat for Flink
> ---------------------------------------------------------------
>
> Key: HUDI-6788
> URL: https://issues.apache.org/jira/browse/HUDI-6788
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Ethan Guo
> Assignee: Zhenqiu Huang
> Priority: Blocker
> Fix For: 1.0.0
>
>
> The existing MergeOnReadInputFormat implements different iterators for all
> kinds of read more: incremental read, read optimized view, snapshot view etc.
> While for better performance and code evolving, we can integrate the new
> FileGroupReader, the main difference is that the FileGroupReader capsulate
> the file slice logs and parquet merging logic, so each iterator can ease the
> redundant work for quering the fs view and comprising the file slices.
> We can integrate step by step for different read views: 1. snapshot queries
> 2. read optimized queries 3. skip merge queries
> For usability and smoth evolving, we should add a flag for the new reader,
> the old code path should be kept there for 1 or 2 releases.
> The major work AIs includes:
> 1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
> 2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;
> 3. Flink implements the snapshot queries using the file group reader;
> 4. Flink implements the read optimized queries using the file group reader;
> 5. Flink implements the skip merge queries using the file group reader.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)