This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7b9eeca1b132 docs(blog): flink async instant generation blog (#17818)
7b9eeca1b132 is described below
commit 7b9eeca1b13219f84cb3847495644379458f340d
Author: Shiyan Xu <[email protected]>
AuthorDate: Fri Jan 9 12:09:14 2026 -0600
docs(blog): flink async instant generation blog (#17818)
---
...09-hudi-11-deep-dive-flink-async-instant-gen.md | 147 +++++++++++++++++++++
.../async-instant-gen.png | Bin 0 -> 149583 bytes
.../blocking-instant-gen.png | Bin 0 -> 150864 bytes
.../file-slicing-completion-time.png | Bin 0 -> 34356 bytes
.../file-slicing-instant-time.png | Bin 0 -> 33333 bytes
.../lsm-timeline.png | Bin 0 -> 100086 bytes
.../timeline.png | Bin 0 -> 283012 bytes
.../truetime.png | Bin 0 -> 39169 bytes
.../writemeta-failover.png | Bin 0 -> 161470 bytes
9 files changed, 147 insertions(+)
diff --git
a/website/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen.md
b/website/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen.md
new file mode 100644
index 000000000000..690063f66c42
--- /dev/null
+++ b/website/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen.md
@@ -0,0 +1,147 @@
+---
+title: "Apache Hudi 1.1 Deep Dive: Async Instant Time Generation for Flink
Writers"
+excerpt: "Explore how Hudi 1.1 introduces asynchronous instant generation for
Flink writers to eliminate throughput fluctuations in streaming ingestion."
+authors: [shuo-cheng]
+category: blog
+image:
/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/async-instant-gen.png
+tags:
+ - hudi
+ - flink
+ - streaming
+---
+
+---
+
+_This blog was translated from the [original blog in
Chinese](https://mp.weixin.qq.com/s/3r06DFdkaiGkF1_7NiovZw)._
+
+---
+
+## Background
+
+Before the Hudi 1.1 release, in order to guarantee the exactly-once semantics
of the Hudi Flink sink, a new instant could only be generated after the
previous instant was successfully committed to Hudi. During this period, Flink
writers had to block and wait. Starting from Hudi 1.1, we introduce a new
asynchronous instant generation mechanism for Flink writers. This approach
allows writers to request the next instant even before the previous one has
been committed successfully. At the sa [...]
+
+## Instant Time
+
+Timeline is a core component of Hudi's architecture. It serves as the single
source of truth for the table's state, recording all operations performed on a
table. Each operation is identified by a commit with a monotonically increasing
instant time, which indicates the start time of each transaction.
+
+
+
+Hudi provides the following capabilities based on instant time:
+
+- **More efficient write rollbacks**: Each Hudi commit corresponds to an
instant time. The instant timestamp can be used to quickly locate files
affected by failed writes.
+- **File name-based file slicing**: Since instant time is encoded into file
names, Hudi can efficiently perform file slicing across different versions of
files within a table.
+- **Incremental queries**: Each row in a Hudi table carries a
`_hoodie_commit_time` metadata field. This allows incremental queries at any
point in the timeline, even when full compaction or cleaning services are
running asynchronously.
+
+## Completion Time
+
+### File Slicing Based on Instant Time
+
+Before release 1.0, Hudi organized data files in units called `FileGroup`.
Each file group contains multiple `FileSlice`s. Each file slice contains one
base file and multiple log files. Every compaction on a file group generates a
new file slice. The timestamp in the base file name corresponds to the instant
time of the compaction operation that wrote the file. The timestamp in the log
file name is the same as the base instant time of the current file slice. Data
files with the same inst [...]
+
+
+
+In concurrent write scenarios, the instant time naming convention of log files
introduces certain limitations: as asynchronous compaction progresses, the base
instant time can change. To ensure that writers can correctly determine the
base instant time, the ordering between write commits and compaction scheduling
must be enforced—meaning that compaction can only be scheduled when there are
no ongoing write operations on the table. Otherwise, a log file might be
written with an incorrect [...]
+
+### File Slicing Based on Completion Time
+
+To address these issues, starting from version 1.0, Hudi introduced a new file
slicing model based on a time interval defined by requested time and completion
time. In release 1.x, each commit has two important time concepts: requested
time and completion time. All generated timestamps are globally monotonically
increasing. The timestamp in the log file name is no longer the base instant
time, but rather the requested instant time of the write operation. During the
file slicing process, [...]
+
+> _A log file belongs to the file slice with the maximum base requested time
that is less than or equal to the log file's completion time._ [5]
+
+
+
+The new file slicing mechanism is more flexible, allowing compaction
scheduling to be completely decoupled from the writer's ingestion process.
Based on this mechanism, Hudi has also implemented powerful non-blocking
concurrency control. For details, refer to RFC-66 [5].
+
+### LSM Timeline
+
+The new file slicing mechanism requires efficient querying of completion time
based on instant time. Starting from version 1.x, Hudi re-implemented the
archived timeline. The new archived timeline organizes its data files in an
LSM-tree structure, enabling fast range-filtering queries based on instant time
and supporting efficient data skipping. For more details, refer to [6].
+
+
+
+## TrueTime
+
+A critical premise of Hudi's timeline is that the timestamp generated for
instants must be globally monotonically increasing without any conflicts.
However, maintaining time monotonicity in distributed transactions has long
been a thorny problem due to:
+
+- **Clock skew**: Physical clocks drift apart across machines.
+- **Network latency**: Communication delays between data centers.
+- **Concurrent transactions**: Difficulty in determining event ordering across
regions.
+
+To solve this problem, Google Spanner's TrueTime [1] uses a globally
synchronized clock with bounded uncertainty, allowing it to assign timestamps
monotonically without conflicts. Similarly, Hudi introduced the TrueTime API
[2] starting from the 1.0 release. There are generally two approaches to
realize TrueTime semantics:
+
+- A single shared time generator process or service, like Google Spanner's
time service.
+- Each process generates its own time and waits until time >= maximum expected
clock drift across all processes, coordinated within a distributed lock.
+
+
+
+Hudi's TrueTime API adopts the second approach: using a distributed lock to
ensure only one process generates time at any given moment. The waiting
mechanism ensures sufficient time passes so that the instant time is
monotonically increasing globally.
+
+## Blocking Instant Time Generation for Flink Writers
+
+For Flink streaming ingestion, each incremental transaction write can be
roughly divided into the following stages:
+
+1. Writers write records into the in-memory buffer.
+2. When the buffer is full or a checkpoint is triggered, writers send a
request to the coordinator for an instant time.
+3. Writers create data files based on the instant time and flush the data to
storage.
+4. The coordinator commits the flushed files and metadata after receiving the
ACK event for the successful checkpoint.
+
+
+
+From Hudi's file slicing mechanism, we can see that the instant time is
required before writers flush records into storage. Before 1.1, although the
committing operation was performed asynchronously in the coordinator, the
writer's request for a new instant would be blocked, because only after the
previous instant is successfully committed can the coordinator create a new
instant. Let's say a batch of records finished flushing with checkpoint `ckp_1`
in the writer at `T1`, and `ckp_1` co [...]
+
+## Async Instant Time Generation
+
+To address throughput fluctuation caused by blocking instant generation, Hudi
1.1 introduces asynchronous instant generation for Flink writers. Using the
aforementioned example, "async" means that before the previous instant is
successfully committed to the timeline (during the time range `[T1, T2]`), the
coordinator can create a new instant and return it to the writer for flushing a
new batch of data. Thus, writers are no longer blocked during checkpoints, and
the ingestion throughput w [...]
+
+
+
+**Overall write workflow:**
+
+- **Writer**: Requests instant time from the coordinator before flushing data:
+ - Data flushing may be triggered either by a checkpoint or by the buffer
being full, so the request carries the last completed checkpoint ID, rather
than the current checkpoint ID.
+ - For a request made at the initial startup of a task, if it's recovered
from state, the checkpoint ID is fetched from the state; otherwise, it's set to
-1.
+- **Coordinator**: Responsible for instant generation. Each checkpoint ID
corresponds to an instant. The coordinator maintains the mapping in the
WriteMeta buffer: `CheckpointID → { Instant → { Writer TaskID → Writer
MetaEvent }}`, where `{Writer TaskID → Writer MetaEvent}` is the mapping
between each writer's parallel task ID and the file metadata written during the
current checkpoint interval. Upon receiving an instant request from a writer:
+ - If an instant corresponding to the checkpoint ID already exists in the
WriteMeta buffer, the cached instant is returned directly.
+ - Otherwise, a new instant is generated, added to the WriteMeta buffer, and
then returned to the writer.
+- **Writer**: Completes data writing, generates data files, and sends the file
metadata to the coordinator.
+- **Coordinator**: Upon receiving the file metadata from a writer, updates the
WriteMeta buffer:
+ - If there is no existing metadata for the current writer task in the
buffer, the metadata is written directly into the cache.
+ - If metadata for the current writer task already exists (i.e., the writer
has flushed multiple times), the metadata is merged and then the cache is
updated.
+- **Coordinator**: When the ACK event for a checkpoint (ID = n) is received,
it starts serially and orderly committing the instants recorded in the
WriteMeta buffer. The commit scope includes all instants corresponding to
checkpoint IDs < n. Once committed successfully, the instants are removed from
the buffer.
+ - Since Flink's checkpoint ACK mechanism does not guarantee that the
coordinator will receive an ACK for every checkpoint, the commit logic follows
Flink's Checkpoint Subsume Contract [4]: if the ACK for checkpoint `ckp_i` is
not received, its written metadata is subsumed into the pending metadata for
the next checkpoint `ckp_i+1`, and will be committed once the ACK for `ckp_i+1`
is received.
+
+### WriteMeta Failover
+
+Considering that the coordinator cannot guarantee receiving every checkpoint
ACK event, the in-memory WriteMeta buffer needs to be persisted to Flink state
to prevent data loss after a task failover.
+
+When designing the snapshot for the WriteMeta buffer, the following points
need to be considered:
+
+- WriteMeta is initially stored in the writer's buffer. Only when a checkpoint
is triggered or an eager flush occurs does the writer send the current
WriteMeta to the coordinator.
+- During checkpointing, the coordinator takes a snapshot first, followed by
the writers' snapshot. At this point, the coordinator's cache does not yet
include the WriteMeta for the current checkpoint interval.
+
+
+
+Based on the above considerations, the WriteMeta snapshot process for
checkpoint i is as follows:
+
+1. The coordinator first persists the WriteMeta for all checkpoint IDs < i to
Flink state.
+2. The writer sends the WriteMeta generated during checkpoint i to the
coordinator, awaiting commit.
+3. The writer cleans up the historical WriteMeta in state and persists the
WriteMeta for checkpoint i to Flink state.
+
+This approach of persisting the WriteMeta buffer ensures that metadata is not
lost, while also preventing state bloat in the WriteMeta state.
+
+## Conclusion
+
+The async instant time generation mechanism introduced in Hudi 1.1 is an
important feature for improving the stability of Flink streaming ingestion. By
eliminating the blocking dependency on the completion of the previous instant's
commit, this mechanism solves the throughput fluctuation and backpressure
problems under large-scale workloads. At the same time, it still maintains
strong transactional guarantees and seamless integration with Flink's
checkpoint semantics. This enhancement is [...]
+
+## References
+
+[1]
[https://research.google/pubs/spanner-truetime-and-the-cap-theorem/](https://research.google/pubs/spanner-truetime-and-the-cap-theorem/)
+
+[2]
[https://hudi.apache.org/docs/next/timeline/#timeline-components](https://hudi.apache.org/docs/next/timeline/#timeline-components)
+
+[3]
[https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md](https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md)
+
+[4]
[https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/state/CheckpointListener.java](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/state/CheckpointListener.java)
+
+[5]
[https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md](https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md)
+
+[6]
[https://hudi.apache.org/docs/next/timeline/#lsm-timeline-history](https://hudi.apache.org/docs/next/timeline/#lsm-timeline-history)
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/async-instant-gen.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/async-instant-gen.png
new file mode 100644
index 000000000000..94959c3791bc
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/async-instant-gen.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/blocking-instant-gen.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/blocking-instant-gen.png
new file mode 100644
index 000000000000..0c0afd39a787
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/blocking-instant-gen.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-completion-time.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-completion-time.png
new file mode 100644
index 000000000000..0af0fd092d94
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-completion-time.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-instant-time.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-instant-time.png
new file mode 100644
index 000000000000..961ae919c115
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-instant-time.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/lsm-timeline.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/lsm-timeline.png
new file mode 100644
index 000000000000..b4f282e1bf54
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/lsm-timeline.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/timeline.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/timeline.png
new file mode 100644
index 000000000000..c47f8ee252a2
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/timeline.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/truetime.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/truetime.png
new file mode 100644
index 000000000000..f828b6b02171
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/truetime.png
differ
diff --git
a/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/writemeta-failover.png
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/writemeta-failover.png
new file mode 100644
index 000000000000..97e5a0614a25
Binary files /dev/null and
b/website/static/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/writemeta-failover.png
differ