This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a7e7c64d88b0 docs(blog): new blog flink 1.1 optimization (#17542)
a7e7c64d88b0 is described below
commit a7e7c64d88b022dbc016587795542bb040c3f770
Author: Shiyan Xu <[email protected]>
AuthorDate: Tue Dec 9 20:54:14 2025 -0600
docs(blog): new blog flink 1.1 optimization (#17542)
---
...ve-optimizing-streaming-ingestion-with-flink.md | 131 +++++++++++++++++++++
.../benchmark-schema1.png | Bin 0 -> 99878 bytes
.../benchmark-string-schemas.png | Bin 0 -> 252290 bytes
.../bytes-copy-optimization.png | Bin 0 -> 226443 bytes
.../flink-native-write-path.png | Bin 0 -> 882382 bytes
.../flink-write-pipeline-before-1.1.png | Bin 0 -> 466691 bytes
.../legacy-avro-write-path.png | Bin 0 -> 865297 bytes
.../log-file-data-block-structure.png | Bin 0 -> 272885 bytes
.../optimized-serde-hoodie-flink-internal-row.png | Bin 0 -> 474851 bytes
9 files changed, 131 insertions(+)
diff --git
a/website/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink.md
b/website/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink.md
new file mode 100644
index 000000000000..73737d4dd033
--- /dev/null
+++
b/website/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink.md
@@ -0,0 +1,131 @@
+---
+title: "Apache Hudi 1.1 Deep Dive: Optimizing Streaming Ingestion with Apache
Flink"
+excerpt: ''
+author: 'Shuo Cheng'
+category: blog
+image:
/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-string-schemas.png
+tags:
+ - hudi
+ - flink
+ - performance
+---
+
+---
+
+_This blog was translated from the [original blog in
Chinese](https://mp.weixin.qq.com/s/ek80Fzw30FOawk1qeWxtCw)._
+
+---
+
+## Background
+
+With the rise of real-time data processing, streaming ingestion has become a
critical use case for Apache Hudi. Apache Flink, a robust stream processing
framework, has been seamlessly integrated with Hudi to support near-real-time
data ingestion. While the Flink integration has already provided powerful and
comprehensive capabilities—such as robust exactly-once guarantees backed by
Flink's checkpointing mechanism, flexible write modes, and rich index
management—as data volumes scale into [...]
+
+There are multiple factors that impact streaming ingestion performance, such
as the network shuffle overhead between Flink operators, SerDe costs within
Hudi writers, and GC issues caused by memory management of in-memory buffers.
With Hudi 1.1, meticulous refactoring and optimization work has been conducted
to solve these problems, significantly enhancing the performance and stability
of streaming ingestion with Flink.
+
+In the subsequent sections, several key performance optimizations are
introduced, including:
+
+- Optimized SerDe between Flink operators
+- New performant Flink-native writers
+- Eliminated bytes copy for MOR log file writing
+
+Following that, performance benchmarks for streaming ingestion in Hudi 1.1 are
presented to demonstrate the concrete improvements achieved.
+
+## Optimized SerDe Between Flink Operators
+
+Before Hudi 1.1, Avro was the default internal record format in the Flink
writing/reading path, which means the first step in the write pipeline was
converting Flink `RowData` into Avro record payload, and then it was used to
create Hudi internal `HoodieRecord` for the following processing.
+
+
+
+Almost every Flink job has to exchange data between its operators, and when
the operators are not chained together (located in the same JVM process),
records need to be serialized to bytes first before being sent to the
downstream operator through the network. The shuffle serialization alone can be
quite costly if not executed efficiently, and thus, when you examine the
profiler output of the job, you will likely see serialization among the top
consumers of CPU cycles. Flink actually has [...]
+
+[RFC-84](https://github.com/apache/hudi/blob/master/rfc/rfc-84/rfc-84.md)
proposed an improvement on SerDe between operators in the Hudi write pipeline
based on the extensible type system of Flink:
+
+- `HoodieFlinkInternalRow`: an object to replace `HoodieRecord` during data
shuffle, which contains `RowData` as the data field and some necessary metadata
fields, e.g., record key, partition path, etc.
+- `HoodieFlinkInternalRowTypeInfo`: a customized Flink type information for
`HoodieFlinkInternalRow`.
+- `HoodieFlinkInternalRowSerializer`: a customized and efficient Flink
`TypeSerializer` for the SerDe of `HoodieFlinkInternalRow`.
+
+
+
+With the customized Flink-native data structure and the companion serializer,
the average streaming ingestion throughput can be boosted by about 25%.
+
+## New Performant Flink-Native Writers
+
+Historically, Hudi Flink writer used `HoodieRecord<AvroPayload>` with data
serialized using Avro to represent incoming records. While this unified format
worked across engines, it came at a performance cost:
+
+
+
+- **Redundant SerDe**: Take MOR table ingestion as an example. Flink reads
records as `RowData`, which are converted to Avro `GenericRecord` and then
serialized to Avro bytes for internal `HoodieAvroRecord`. During the later log
writing, the Avro bytes in `HoodieAvroRecord` are deserialized back into Avro
`IndexedRecord` for further processing before being appended to log files.
Apparently, with Avro as the intermediate record representation, significant
SerDe overhead is introduced.
+- **Excess memory usage**: The buffer in the writers is a Java List with
intermediate Avro objects which will be released after being flushed to disk.
These objects can increase heap usage and GC pressure, particularly under
high-throughput streaming workloads.
+
+### RFC-87: A Flink-Native Write Path
+
+[RFC-87](https://github.com/apache/hudi/blob/master/rfc/rfc-87/rfc-87.md)
proposes and implements a shift in data representation: instead of transforming
`RowData` to `GenericRecord`, the Flink writer now directly wraps `RowData`
inside a specialized `HoodieRecord`. The entire write path now operates around
the `RowData` structure, eliminating all redundant conversion overhead. This
change is Flink engine-specific and transparent to the overall Hudi writer
lifecycle.
+
+
+
+**Key Changes:**
+
+- **Customized HoodieRecord**: Introduced `HoodieFlinkRecord` to encapsulate
Flink's `RowData` directly—no extra conversion to Avro record in the writing
path anymore.
+- **Self-managed binary buffer**: Flink's internal `BinaryRowData` is in
binary format. We've implemented a binary buffer to cache the bytes of the
`RowData`. The memory is managed by the writer and can be reused after the
previously cached `RowData` bytes are flushed to disk. With this self-managed
binary buffer, GC pressure can be effectively reduced even under
high-throughput workloads.
+- **Flexible log formats**: For MOR tables, records are written as data blocks
in log files. We currently support two kinds of block types:
+ - **Avro data block**: Optimized for row-level writes, making it ideal for
streaming ingestion or workloads with frequent updates and inserts. It's the
default block type for log files.
+ - **Parquet data block**: Columnar and better suited for data with a high
compression ratio, e.g., records with primitive type fields.
+
+By leveraging the Flink-native data model, the redundant Avro conversions
along with SerDe overhead are eliminated in the streaming ingestion pipeline,
which brings significant improvements in both write latency and resource
consumption.
+
+## Eliminate Bytes Copy for MOR Log File Writing
+
+The log file of a MOR table is composed of data blocks. When writing a log
file, the basic unit is the data block. The main contents of each block are
shown in the figure below, where the `Content` is binary bytes generated by
serializing the records in the buffer into the specified block format.
+
+
+
+As mentioned in the previous section, the default data block format is Avro.
In Hudi 1.1, the serialization of this block type has been carefully optimized
by eliminating record-level bytes copy. Specifically, before Hudi 1.1, each
record was serialized by the Avro Writer into a `ByteArrayOutputStream`, then
the `toByteArray` method was invoked to obtain the data bytes for writing into
the outer output stream for the data block. However, under the hood,
`toByteArray` involves the creatio [...]
+
+
+
+### Improvement in Hudi 1.1
+
+In Hudi 1.1, the `writeTo` method of `ByteArrayOutputStream` is utilized to
directly write the underlying data bytes to the outer output stream for the
block, thereby avoiding additional record-level bytes copy and effectively
reducing GC pressure.
+
+## Benchmark
+
+To demonstrate performance improvements in streaming ingestion with Flink, we
performed comprehensive benchmark testing of Apache Hudi 1.1 vs. Apache Hudi
1.0 vs. Apache Paimon 1.0.1. Currently there is no standard test dataset for
streaming read/write. Since it's important to run transparent and reproducible
benchmarking, we decided to use the existing [Cluster benchmark
program](https://github.com/apache/paimon/tree/master/paimon-benchmark/paimon-cluster-benchmark)
of Apache Paimon ins [...]
+
+### Cluster Environment
+
+The benchmarks were run on an Alibaba Cloud EMR cluster, with the following
settings:
+
+- **EMR on ECS**: version 5.18.1
+ - Master (x1): 8 vCPU, 32 GiB, 5 Gbps
+ - Worker (x4): 24 vCPU, 96 GiB, 12 Gbps
+- **Apache Hudi version**: 1.1 and 1.0.1
+- **Apache Paimon version**: 1.0.1
+- **Apache Flink version**: 1.17.2
+- **HDFS version**: 3.2.1
+
+### Streaming Ingestion Scenario
+
+The testing scenario is the most common streaming ingestion case in
production: MOR table with UPSERT operation and Bucket index. We used the
Paimon Benchmark program to simulate the workloads, where the data source was
generated by the Flink DataGen connector, which produces records with primary
keys ranging from 0 to 100,000,000, and the total record number is 500 million.
For more detailed settings, refer to [this
repository](https://github.com/cshuo/streaming-benchmark).
+
+Note that we disabled compaction in the test ingestion jobs, since it can
significantly impact the performance of the ingestion job for both Hudi and
Paimon and interfere with a fair comparison of ingestion performance. In fact,
this is also common practice in production.
+
+Additionally, the schema of a table also has a significant impact on write
performance. There is a noticeable difference in the processing overhead
between numeric primitive type fields and string type fields. Therefore,
besides the default table schema (Schema1) used in the Paimon Benchmark
program, we also added 3 different schemas containing mostly STRING-type
fields, which are much more common in production scenarios.
+
+| Schema1 | Schema2 | Schema3 | Schema4
|
+|-----------------------------|------------------|------------------|-------------------|
+| 1 String + 10 BIGINT fields | 20 String fields | 50 String fields | 100
String fields |
+
+### Benchmark Results
+
+- For Schema1 with almost all fields being numeric type, the streaming
ingestion performance of Hudi 1.1 is about 3.5 times that of Hudi 1.0. The
performance gain mainly comes from the optimizations introduced in RFC-84 and
RFC-87, which reduce the shuffle SerDe overhead in the ingestion pipeline and
internal Avro SerDe costs in the writer.
+- Streaming ingestion throughput of Paimon is slightly higher than that of
Hudi 1.1. Through detailed profiling and analysis, we found that this
performance gap mainly stems from the fact that each `HoodieRecord` contains 5
extra String-type metadata fields by default, and in simple schema scenarios
like Schema1, these record-level additional fields have a significant
performance impact.
+
+
+
+- For schemas where most fields are of STRING type, Hudi 1.1 achieves the best
streaming ingestion performance. Based on the profiling analysis, we found that
when the data fields are all strings, Paimon's approach of directly writing to
Parquet files incurs noticeable compression overhead, which negatively impacts
throughput—even though the benchmark tests used SNAPPY, the fastest available
compression codec. Hudi, on the other hand, writes incremental data in
row-based Avro format to l [...]
+
+
+
+## Summary
+
+The optimizations in Hudi 1.1 around writer performance for Flink have brought
significant, multi-fold improvements to streaming ingestion throughput. These
enhancements are transparent and backward-compatible, allowing users to
seamlessly upgrade their jobs from earlier Hudi versions to the latest version
and enjoy the substantial performance gains without any additional operational
overhead.
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-schema1.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-schema1.png
new file mode 100644
index 000000000000..b85d98f6a5eb
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-schema1.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-string-schemas.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-string-schemas.png
new file mode 100644
index 000000000000..a52ea1778e74
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/benchmark-string-schemas.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/bytes-copy-optimization.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/bytes-copy-optimization.png
new file mode 100644
index 000000000000..266aa9986696
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/bytes-copy-optimization.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-native-write-path.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-native-write-path.png
new file mode 100644
index 000000000000..fefc69308bf8
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-native-write-path.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-write-pipeline-before-1.1.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-write-pipeline-before-1.1.png
new file mode 100644
index 000000000000..b36c79ad4228
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/flink-write-pipeline-before-1.1.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/legacy-avro-write-path.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/legacy-avro-write-path.png
new file mode 100644
index 000000000000..77b32f2b24ff
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/legacy-avro-write-path.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/log-file-data-block-structure.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/log-file-data-block-structure.png
new file mode 100644
index 000000000000..c38710a220aa
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/log-file-data-block-structure.png
differ
diff --git
a/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/optimized-serde-hoodie-flink-internal-row.png
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/optimized-serde-hoodie-flink-internal-row.png
new file mode 100644
index 000000000000..53ac9a55788f
Binary files /dev/null and
b/website/static/assets/images/blog/2025-12-10-apache-hudi-11-deep-dive-optimizing-streaming-ingestion-with-flink/optimized-serde-hoodie-flink-internal-row.png
differ