Re: [DISCUSS] FIP-24：Support tiering Fluss data to Hudi

yuxia Mon, 30 Mar 2026 23:02:22 -0700

Thanks for the clarification. I don’t have any more comments — overall LGTM.


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "han" <[email protected]>
收件人: "dev" <[email protected]>, "yuxia" <[email protected]>
发送时间: 星期一, 2026年 3 月 30日 上午 10:40:03
主题: Re:Re: [DISCUSS] FIP-24：Support tiering Fluss data to Hudi

Hi, Yuxia,


Thank you for your question. This is indeed a valid observation.

>From a design perspective, Hudi adopts an engine-specific encapsulation 
>approach for its internal data structures rather than a unified, abstracted 
>data model. As such, it does not have a universal internal data structure akin 
>to Paimon's InternalRow or Iceberg's Record that is agnostic to query engines.

The current implementation of the Hudi source module is purpose-built for the 
Flink engine, which means data is read and parsed into Flink's RowData format 
when processing COW and MOR table types. This design aligns with Hudi's core 
principle of leveraging engine-native data structures to optimize performance 
and compatibility with the target engine's execution pipeline.

If support for Spark (or other engines like Trino/Presto) were to be added in 
the future, it would require dedicated development effort to implement 
engine-specific parsing logic—converting Hudi's FileSlice directly into Spark's 
InternalRow (or equivalent native structures for other engines)—rather than 
relying on a single universal intermediate format.

This engine-dependent design is intentional: it reduces 
serialization/deserialization overhead and increases integration depth with 
each engine's native execution model, which is crucial for Hudi's performance 
in large-scale batch/stream processing scenarios.

Best regards,
Fei Han


在 2026-03-24 19:28:49，"yuxia" <[email protected]> 写道：
>Hi, Fei Han
>
>Thanks for proposing this FIP. It’s great to see Fluss gaining more complete 
>support for data lakes, and overall this looks LGTM to me.
>
>I just have one small question about this part:
>
>> “COW and MOR table types employ different reading methods to parse FileSlice 
>> into Flink RowData, which is then further converted into Fluss InternalRow.”
>
>From this description, it seems the read path currently depends on Flink’s 
>RowData. Is it possible to avoid that dependency and convert directly into 
>Fluss
>InternalRow instead? If so, the overall dependency footprint of the Hudi 
>module could likely be much lighter.
>
>
>Best regards,
>Yuxia
>
>----- 原始邮件 -----
>发件人: "han" <[email protected]>
>收件人: "dev" <[email protected]>
>发送时间: 星期二, 2026年 3 月 24日 下午 6:18:59
>主题: [DISCUSS] FIP-24：Support tiering Fluss data to Hudi
>
>Hi devs,
>
>
>I'd like to start a discussion on FIP-24：Support tiering Fluss data to Hudi.
>
>
>Just like sync data to Paimon and Iceberg by tiering service,  Hudi is also 
>required.
>
>
>You can find more details here[1] and any feedback and suggestions are welcome 
>!
>
>
>
>
>[1]
>https://cwiki.apache.org/confluence/display/FLUSS/FIP-24%3A+Support+tiering+Fluss+data+to+Hudi
>
>
>
>
>Best regards,
>Fei Han

Re: [DISCUSS] FIP-24：Support tiering Fluss data to Hudi

Reply via email to