Hi, Yuxia,
Thank you for your question. This is indeed a valid observation. From a design perspective, Hudi adopts an engine-specific encapsulation approach for its internal data structures rather than a unified, abstracted data model. As such, it does not have a universal internal data structure akin to Paimon's InternalRow or Iceberg's Record that is agnostic to query engines. The current implementation of the Hudi source module is purpose-built for the Flink engine, which means data is read and parsed into Flink's RowData format when processing COW and MOR table types. This design aligns with Hudi's core principle of leveraging engine-native data structures to optimize performance and compatibility with the target engine's execution pipeline. If support for Spark (or other engines like Trino/Presto) were to be added in the future, it would require dedicated development effort to implement engine-specific parsing logic—converting Hudi's FileSlice directly into Spark's InternalRow (or equivalent native structures for other engines)—rather than relying on a single universal intermediate format. This engine-dependent design is intentional: it reduces serialization/deserialization overhead and increases integration depth with each engine's native execution model, which is crucial for Hudi's performance in large-scale batch/stream processing scenarios. Best regards, Fei Han 在 2026-03-24 19:28:49,"yuxia" <[email protected]> 写道: >Hi, Fei Han > >Thanks for proposing this FIP. It’s great to see Fluss gaining more complete >support for data lakes, and overall this looks LGTM to me. > >I just have one small question about this part: > >> “COW and MOR table types employ different reading methods to parse FileSlice >> into Flink RowData, which is then further converted into Fluss InternalRow.” > >From this description, it seems the read path currently depends on Flink’s >RowData. Is it possible to avoid that dependency and convert directly into >Fluss >InternalRow instead? If so, the overall dependency footprint of the Hudi >module could likely be much lighter. > > >Best regards, >Yuxia > >----- 原始邮件 ----- >发件人: "han" <[email protected]> >收件人: "dev" <[email protected]> >发送时间: 星期二, 2026年 3 月 24日 下午 6:18:59 >主题: [DISCUSS] FIP-24:Support tiering Fluss data to Hudi > >Hi devs, > > >I'd like to start a discussion on FIP-24:Support tiering Fluss data to Hudi. > > >Just like sync data to Paimon and Iceberg by tiering service, Hudi is also >required. > > >You can find more details here[1] and any feedback and suggestions are welcome >! > > > > >[1] >https://cwiki.apache.org/confluence/display/FLUSS/FIP-24%3A+Support+tiering+Fluss+data+to+Hudi > > > > >Best regards, >Fei Han
