wombatu-kun commented on PR #12795:
URL: https://github.com/apache/hudi/pull/12795#issuecomment-2655686309

   > #### Option 1, reusability
   > In this case, we need to implement proposed changes. And convert `RowData` 
into new `HoodieFlinkRecord` at _Operator 1_. But serde costs will be almost 
the same.
   > 
   > #### Option 2, performance
   > There is ready for review: #12796 with implemented switch to 
`HoodieFlinkInternalRow` instead of `HoodieRecord` at _Operator 1_ and 
_Operator 2_. `HoodieFlinkInternalRow` doesn't extend `HoodieRecord`, and 
contains only necessary data. `HoodieFlinkInternalRowSerializer` is implemented 
for maximum performance. But at _Operator 3_, conversion into Avro is made.
   > 
   > #### Option 3, combination of both
   > I see the most perspective roadmap is using of `HoodieFlinkInternalRow` in 
_Operator 1_ and _Operator 2_, and switch to new `HoodieFlinkRecord` proposed 
here in _Operator 3_.
   > 
   > There are two main steps in making huge Flink performance breakthrough 
with Hudi:
   > 
   > 1. Optimize serde until writers.
   > 2. Optimize data structures in writes.
   > 
   > Step 1 is already implemented, and wait for review. Step 2 is proposed 
here, and would take some time.
   
   Hi, Geser. Agree with you, I think the 3rd option is the best way: merge 
your optimizations at first and then implement this RFC together (with @cshuo 
and @Alowator) as a second step.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to