Re: [PR] docs(blog): flink async instant generation blog [hudi]

via GitHub Fri, 09 Jan 2026 10:12:26 -0800


Copilot commented on code in PR #17818:
URL: https://github.com/apache/hudi/pull/17818#discussion_r2677145145



##########
website/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen.md:
##########
@@ -0,0 +1,147 @@
+---
+title: "Apache Hudi 1.1 Deep Dive: Async Instant Time Generation for Flink 
Writers"
+excerpt: "Explore how Hudi 1.1 introduces asynchronous instant generation for 
Flink writers to eliminate throughput fluctuations in streaming ingestion."
+authors: [shuo-cheng]
+category: blog
+image: 
/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/async-instant-gen.png
+tags:
+  - hudi
+  - flink
+  - streaming
+---
+
+---
+
+_This blog was translated from the [original blog in 
Chinese](https://mp.weixin.qq.com/s/3r06DFdkaiGkF1_7NiovZw)._
+
+---
+
+## Background
+
+Before the Hudi 1.1 release, in order to guarantee the exactly-once semantics 
of the Hudi Flink sink, a new instant could only be generated after the 
previous instant was successfully committed to Hudi. During this period, Flink 
writers had to block and wait. Starting from Hudi 1.1, we introduce a new 
asynchronous instant generation mechanism for Flink writers. This approach 
allows writers to request the next instant even before the previous one has 
been committed successfully. At the same time, it still ensures the ordering 
and consistency of multi-transaction commits. In the following sections, we 
will first briefly introduce some of Hudi's basic concepts, and then dive into 
the details of asynchronous instant time generation.
+
+## Instant Time
+
+Timeline is a core component of Hudi's architecture. It serves as the single 
source of truth for the table's state, recording all operations performed on a 
table. Each operation is identified by a commit with a monotonically increasing 
instant time, which indicates the start time of each transaction.
+
+![Timeline](/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/timeline.png)
+
+Hudi provides the following capabilities based on instant time:
+
+- **More efficient write rollbacks**: Each Hudi commit corresponds to an 
instant time. The instant timestamp can be used to quickly locate files 
affected by failed writes.
+- **File name-based file slicing**: Since instant time is encoded into file 
names, Hudi can efficiently perform file slicing across different versions of 
files within a table.
+- **Incremental queries**: Each row in a Hudi table carries a 
`_hoodie_commit_time` metadata field. This allows incremental queries at any 
point in the timeline, even when full compaction or cleaning services are 
running asynchronously.
+
+## Completion Time
+
+### File Slicing Based on Instant Time
+
+Before release 1.0, Hudi organized data files in units called `FileGroup`. 
Each file group contains multiple `FileSlice`s. Each file slice contains one 
base file and multiple log files. Every compaction on a file group generates a 
new file slice. The timestamp in the base file name corresponds to the instant 
time of the compaction operation that wrote the file. The timestamp in the log 
file name is the same as the base instant time of the current file slice. Data 
files with the same instant time belong to the same file slice.
+
+![File Slicing based on Instant 
Time](/assets/images/blog/2026-01-09-hudi-11-deep-dive-flink-async-instant-gen/file-slicing-instant-time.png)
+
+In concurrent write scenarios, the instant time naming convention of log files 
introduces certain limitations: as asynchronous compaction progresses, the base 
instant time can change. To ensure that writers can correctly determine the 
base instant time, the ordering between write commits and compaction scheduling 
must be enforced—meaning that compaction can only be scheduled when there are 
no ongoing write operations on the table. Otherwise, a log file might be 
written with an incorrect base instant time, potentially leading to data loss. 
As a result, compaction scheduling may block all writers in concurrency mode.
+
+### File Slicing Based on Completion Time
+
+To address these issues, starting from version 1.0, Hudi introduced a new file 
slicing model based on a time interval defined by requested time and completion 
time. In release 1.x, each commit has two important time concepts: requested 
time and completion time. All generated timestamps are globally monotonically 
increasing. The timestamp in the log file name is no longer the base instant 
time, but rather the requested instant time of the write operation. During the 
file slicing process, Hudi looks up the completion time for each log file using 
its instant time and applies a new file slicing rule:
+
+> _A log file belongs to the file slice with the maximum base requested time 
that is less than or equal to the log file's completion time._ [5]

Review Comment:
   Missing space between the period and the citation. Should be "time. [5]" 
instead of "time._[5]"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs(blog): flink async instant generation blog [hudi]

Reply via email to