This is an automated email from the ASF dual-hosted git repository.
hangxiang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push:
new 2ec8f8157f9 [FLINK-34119][doc] Improve description about changelog in
document
2ec8f8157f9 is described below
commit 2ec8f8157f95a79ee94d609657f9b08f8f0b6a26
Author: Hangxiang Yu <[email protected]>
AuthorDate: Sat Jan 13 14:50:36 2024 +0800
[FLINK-34119][doc] Improve description about changelog in document
---
docs/content.zh/docs/deployment/config.md | 3 +--
docs/content.zh/docs/ops/state/state_backends.md | 13 +++++++------
docs/content/docs/deployment/config.md | 3 +--
docs/content/docs/ops/state/state_backends.md | 15 ++++++++-------
4 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/docs/content.zh/docs/deployment/config.md
b/docs/content.zh/docs/deployment/config.md
index 34d04f733e5..cf0740bf8de 100644
--- a/docs/content.zh/docs/deployment/config.md
+++ b/docs/content.zh/docs/deployment/config.md
@@ -370,8 +370,7 @@ Advanced options to tune RocksDB and RocksDB checkpoints.
### State Changelog Options
Please refer to [State Backends]({{< ref
"docs/ops/state/state_backends#enabling-changelog" >}}) for information on
-using State Changelog. {{< hint warning >}} The feature is in experimental
status. {{< /hint >}} {{<
-generated/state_backend_changelog_section >}}
+using State Changelog.
#### FileSystem-based Changelog options
diff --git a/docs/content.zh/docs/ops/state/state_backends.md
b/docs/content.zh/docs/ops/state/state_backends.md
index eda37dada7e..5d7d4f92b1c 100644
--- a/docs/content.zh/docs/ops/state/state_backends.md
+++ b/docs/content.zh/docs/ops/state/state_backends.md
@@ -349,10 +349,6 @@ Python API 中尚不支持该特性。
## 开启 Changelog
-{{< hint warning >}} 该功能处于实验状态。 {{< /hint >}}
-
-{{< hint warning >}} 开启 Changelog 可能会给您的应用带来性能损失。(见下文) {{< /hint >}}
-
<a name="introduction"></a>
### 介绍
@@ -372,16 +368,21 @@ Changelog 是一项旨在减少 checkpointing 时间的功能,因此也可以
开启 Changelog 功能之后,Flink 会不断上传状态变更并形成 changelog。创建 checkpoint 时,只有 changelog
中的相关部分需要上传。而配置的状态后端则会定期在后台进行快照,快照成功上传后,相关的changelog 将会被截断。
-基于此,异步阶段的持续时间减少(另外因为不需要将数据刷新到磁盘,同步阶段持续时间也减少了),特别是长尾延迟得到了改善。
+基于此,异步阶段的持续时间减少(另外因为不需要将数据刷新到磁盘,同步阶段持续时间也减少了),特别是长尾延迟得到了改善。同时,还可以获得一些其他好处:
+1. 更稳定、更低的端到端时延。
+2. Failover 后数据重放更少。
+3. 资源利用更加稳定。
但是,资源使用会变得更高:
- 将会在 DFS 上创建更多文件
-- 将可能在 DFS 上残留更多文件(这将在 FLINK-25511 和 FLINK-25512 之后的新版本中被解决)
- 将使用更多的 IO 带宽用来上传状态变更
- 将使用更多 CPU 资源来序列化状态变更
- Task Managers 将会使用更多内存来缓存状态变更
+值得注意的是虽然 Changelog 增加了少量的日常 CPU 和网络带宽资源使用,
+但会降低峰值的 CPU 和网络带宽使用量。
+
另一项需要考虑的事情是恢复时间。取决于 `state.backend.changelog.periodic-materialize.interval`
的设置,changelog 可能会变得冗长,因此重放会花费更多时间。即使这样,恢复时间加上 checkpoint 持续时间仍然可能低于不开启
changelog 功能的时间,从而在故障恢复的情况下也能提供更低的端到端延迟。当然,取决于上述时间的实际比例,有效恢复时间也有可能会增加。
有关更多详细信息,请参阅
[FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints)。
diff --git a/docs/content/docs/deployment/config.md
b/docs/content/docs/deployment/config.md
index cbdc4f25a77..c4e70ba7235 100644
--- a/docs/content/docs/deployment/config.md
+++ b/docs/content/docs/deployment/config.md
@@ -372,8 +372,7 @@ Advanced options to tune RocksDB and RocksDB checkpoints.
### State Changelog Options
Please refer to [State Backends]({{< ref
"docs/ops/state/state_backends#enabling-changelog" >}}) for information on
-using State Changelog. {{< hint warning >}} The feature is in experimental
status. {{< /hint >}} {{<
-generated/state_backend_changelog_section >}}
+using State Changelog.
#### FileSystem-based Changelog options
diff --git a/docs/content/docs/ops/state/state_backends.md
b/docs/content/docs/ops/state/state_backends.md
index b645eefcd8b..bd04491977f 100644
--- a/docs/content/docs/ops/state/state_backends.md
+++ b/docs/content/docs/ops/state/state_backends.md
@@ -346,10 +346,6 @@ Still not supported in Python API.
## Enabling Changelog
-{{< hint warning >}} This feature is in experimental status. {{< /hint >}}
-
-{{< hint warning >}} Enabling Changelog may have a negative performance impact
on your application (see below). {{< /hint >}}
-
### Introduction
Changelog is a feature that aims to decrease checkpointing time and,
therefore, end-to-end latency in exactly-once mode.
@@ -361,7 +357,7 @@ Most commonly, checkpoint duration is affected by:
and [Buffer debloating]({{< ref
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
2. Snapshot creation time (so-called synchronous phase), addressed by
asynchronous snapshots (mentioned [above]({{<
ref "#the-embeddedrocksdbstatebackend">}}))
-4. Snapshot upload time (asynchronous phase)
+3. Snapshot upload time (asynchronous phase)
Upload time can be decreased by [incremental checkpoints]({{< ref
"#incremental-checkpoints" >}}).
However, most incremental state backends perform some form of compaction
periodically, which results in re-uploading the
@@ -373,16 +369,21 @@ part of this changelog needs to be uploaded. The
configured state backend is sna
background periodically. Upon successful upload, the changelog is truncated.
As a result, asynchronous phase duration is reduced, as well as synchronous
phase - because no data needs to be flushed
-to disk. In particular, long-tail latency is improved.
+to disk. In particular, long-tail latency is improved. At the same time, some
other benefits could be got:
+1. More Stable and Lower End-to-end Latency.
+2. Less Data Replay after Failover.
+3. More Stable Utilization of Resources.
However, resource usage is higher:
- more files are created on DFS
-- more files can be left undeleted DFS (this will be addressed in the future
versions in FLINK-25511 and FLINK-25512)
- more IO bandwidth is used to upload state changes
- more CPU used to serialize state changes
- more memory used by Task Managers to buffer state changes
+It is worth noting that changelog adds a small amount of daily CPU and network
bandwidth resources,
+but reduces peak CPU and network bandwidth usage.
+
Recovery time is another thing to consider. Depending on the
`state.backend.changelog.periodic-materialize.interval`
setting, the changelog can become lengthy and replaying it may take more time.
However, recovery time combined with
checkpoint duration will likely still be lower than in non-changelog setups,
providing lower end-to-end latency even in