wombatu-kun commented on PR #12795: URL: https://github.com/apache/hudi/pull/12795#issuecomment-2655686309
> #### Option 1, reusability > In this case, we need to implement proposed changes. And convert `RowData` into new `HoodieFlinkRecord` at _Operator 1_. But serde costs will be almost the same. > > #### Option 2, performance > There is ready for review: #12796 with implemented switch to `HoodieFlinkInternalRow` instead of `HoodieRecord` at _Operator 1_ and _Operator 2_. `HoodieFlinkInternalRow` doesn't extend `HoodieRecord`, and contains only necessary data. `HoodieFlinkInternalRowSerializer` is implemented for maximum performance. But at _Operator 3_, conversion into Avro is made. > > #### Option 3, combination of both > I see the most perspective roadmap is using of `HoodieFlinkInternalRow` in _Operator 1_ and _Operator 2_, and switch to new `HoodieFlinkRecord` proposed here in _Operator 3_. > > There are two main steps in making huge Flink performance breakthrough with Hudi: > > 1. Optimize serde until writers. > 2. Optimize data structures in writes. > > Step 1 is already implemented, and wait for review. Step 2 is proposed here, and would take some time. Hi, Geser. Agree with you, I think the 3rd option is the best way: merge your optimizations at first and then implement this RFC together (with @cshuo and @Alowator) as a second step. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
