yuzhaojing commented on PR #12795: URL: https://github.com/apache/hudi/pull/12795#issuecomment-2655457292
> Hi, all! I want to discuss again, what result we want in the end? Do we want to focus on abstraction and reusability, or focus on Flink performance? > > Rough description of what is happening in Flink write is the following. > > 1. `DataStream<RowData>` is converted into something with necessary Hudi metadata. > 2. *Decision where to route records preparing filenames, etc., to rebalance stream between writers. > 3. Actual writing into file system. > 4. Table services. (We don't care about this step, because there is already no stream of records here.) > > Between those steps we serialize/deserialize each record. And it costs almost half of the total cost. > > I've checked switch to Flink `Tuple(metadata, RowData)` instead of `HoodieRecord`, and got amazing performance increase. But when I tried `Tuple(Tuple(metadata), RowData)`, I lost about 5% of performance. So, **even such kind of small change, one nested `Tuple`, does matter**. > > #### Option 1, reusability > In this case, we need to implement proposed changes. And convert `RowData` into new `HoodieFlinkRecord` at _Operator 1_. But serde costs will be almost the same. > > #### Option 2, performance > There is ready for review: #12796 with implemented switch to `HoodieFlinkInternalRow` instead of `HoodieRecord` at _Operator 1_ and _Operator 2_. `HoodieFlinkInternalRow` doesn't extend `HoodieRecord`, and contains only necessary data. `HoodieFlinkInternalRowSerializer` is implemented for maximum performance. But at _Operator 3_, conversion into Avro is made. > > #### Option 3, combination of both > I see the most perspective roadmap is using of `HoodieFlinkInternalRow` in _Operator 1_ and _Operator 2_, and switch to new `HoodieFlinkRecord` proposed here in _Operator 3_. > > There are two main steps in making huge Flink performance breakthrough with Hudi: > > 1. Optimize serde until writers. > 2. Optimize data structures in writes. > > Step 1 is already implemented, and wait for review. Step 2 is proposed here, and would take some time. > > **I ask community to unblock work on step 1**, because I see misunderstanding on purpose of #12796. > > @danny0405 , @yuzhaojing, @zhangyue19921010, @voonhous, @Alowator, @wombatu-kun, what do you think about it? > Hi, all! I want to discuss again, what result we want in the end? Do we want to focus on abstraction and reusability, or focus on Flink performance? Answer on this question will affect all work around. > > Rough description of what is happening in Flink write is the following. > > 1. `DataStream<RowData>` is converted into something with necessary Hudi metadata. > 2. *Decision where to route records preparing filenames, etc., to rebalance stream between writers. > 3. Actual writing into file system. > 4. Table services. (We don't care about this step, because there is already no stream of records here.) > > Between those steps we serialize/deserialize each record. And it costs almost half of the total cost. > > I've checked switch to Flink `Tuple(metadata, RowData)` instead of `HoodieRecord`, and got amazing performance increase. But when I tried `Tuple(Tuple(metadata), RowData)`, I lost about 5% of performance. So, **even such kind of small change, one nested `Tuple`, does matter**. > > #### Option 1, reusability > In this case, we need to implement proposed changes. And convert `RowData` into new `HoodieFlinkRecord` at _Operator 1_. But serde costs will be almost the same. > > #### Option 2, performance > There is ready for review: #12796 with implemented switch to `HoodieFlinkInternalRow` instead of `HoodieRecord` at _Operator 1_ and _Operator 2_. `HoodieFlinkInternalRow` doesn't extend `HoodieRecord`, and contains only necessary data. `HoodieFlinkInternalRowSerializer` is implemented for maximum performance. But at _Operator 3_, conversion into Avro is made. > > #### Option 3, combination of both > I see the most perspective roadmap is using of `HoodieFlinkInternalRow` in _Operator 1_ and _Operator 2_, and switch to new `HoodieFlinkRecord` proposed here in _Operator 3_. > > There are two main steps in making huge Flink performance breakthrough with Hudi: > > 1. Optimize serde until writers. > 2. Optimize data structures in writes. > > Step 1 is already implemented, and wait for review. Step 2 is proposed here, and would take some time. > > **I ask community to unblock work on step 1**, because I see misunderstanding on purpose of #12796. > > @danny0405 , @yuzhaojing, @zhangyue19921010, @voonhous, @Alowator, @wombatu-kun, what do you think about it? I think both aspects are equally important. Abstraction and reusability can significantly aid the subsequent development and code structure of the entire project. This not only enhances Flink's performance but also greatly benefits other engines such as Spark and Hive. However, I think the performance improvement for Flink could be treated as an independent optimization point. This RFC should primarily focus on the improvements related to writer/reader format and schema. What are your thoughts on this? @danny0405 @cshuo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
