This is an automated email from the ASF dual-hosted git repository.
martijnvisser pushed a commit to branch release-1.17
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.17 by this push:
new aaf72e45b9b [FLINK-28853][Docs] Sync watermark split alignment
documentation with the Chinese version. This closes #22724
aaf72e45b9b is described below
commit aaf72e45b9b21096c43165dd0535c359ba7f40c9
Author: Mason Chen <[email protected]>
AuthorDate: Tue Jun 6 23:49:56 2023 -0700
[FLINK-28853][Docs] Sync watermark split alignment documentation with the
Chinese version. This closes #22724
---
.../datastream/event-time/generating_watermarks.md | 22 ++++++++++++----------
docs/content.zh/docs/dev/datastream/sources.md | 6 ++++++
2 files changed, 18 insertions(+), 10 deletions(-)
diff --git
a/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
b/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
index f5f13d239f5..5109c5586a6 100644
--- a/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
+++ b/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
@@ -268,19 +268,21 @@ other sources/tasks which can move the combined watermark
forward and that way u
one.
{{< hint warning >}}
-**Note:** As of 1.15, Flink supports aligning across tasks of the same source
and/or different
-sources. It does not support aligning splits/partitions/shards in the same
task.
+**Note:** As of Flink 1.17, split level watermark alignment is supported by
the FLIP-27 source framework.
+Source connectors have to implement an interface to resume and pause splits so
that splits/partitions/shards
+can be aligned in the same task. More detail on the pause and resume
interfaces can found in the [Source API]({{< ref "docs/dev/datastream/sources"
>}}#split-level-watermark-alignment).
-In a case where there are e.g. two Kafka partitions that produce watermarks at
different pace, that
-get assigned to the same task watermark might not behave as expected.
Fortunately, worst case it
-should not perform worse than without alignment.
+If you are upgrading from a Flink version between 1.15.x and 1.16.x inclusive,
you can disable split level alignment by setting
+`pipeline.watermark-alignment.allow-unaligned-source-splits` to true.
Moreover, you can tell if your source supports split level alignment
+by checking if it throws an `UnsupportedOperationException` at runtime or by
reading the javadocs. In this case, it would be desirable to
+to disable split level watermark alignment to avoid fatal exceptions.
-Given the limitation above, we suggest applying watermark alignment in two
situations:
-
-1. You have two different sources (e.g. Kafka and File) that produce
watermarks at different speeds
-2. You run your source with parallelism equal to the number of
splits/shards/partitions, which
- results in every subtask being assigned a single unit of work.
+When setting the flag to true, watermark alignment will be only working
properly when the number of splits/shards/partitions is equal to the
+parallelism of the source operator. This results in every subtask being
assigned a single unit of work. On the other hand, if there are two Kafka
partitions, which produce watermarks at different paces and
+get assigned to the same task, then watermarks might not behave as expected.
Fortunately, even in the worst case, the basic alignment should not perform
worse than having no alignment at all.
+Furthermore, Flink also supports aligning across tasks of the same sources
and/or different
+sources, which is useful when you have two different sources (e.g. Kafka and
File) that produce watermarks at different speeds.
{{< /hint >}}
<a name="writing-watermarkgenerators"></a>
diff --git a/docs/content.zh/docs/dev/datastream/sources.md
b/docs/content.zh/docs/dev/datastream/sources.md
index a18d38060e6..a28383ef6fa 100644
--- a/docs/content.zh/docs/dev/datastream/sources.md
+++ b/docs/content.zh/docs/dev/datastream/sources.md
@@ -447,3 +447,9 @@ environment.from_source(
使用 *SplitReader API* 实现源连接器时,将自动进行处理。所有基于 SplitReader API
的实现都具有开箱即用(out-of-the-box)的分片水印。
为了保证更底层的 `SourceReader` API
可以使用每个分片的水印生成,必须将不同分片的事件输送到不同的输出(outputs)中:*局部分片(Split-local) SourceOutputs*。通过
`createOutputForSplit(splitId)` 和 `releaseOutputForSplit(splitId)` 方法,可以在总 {{<
gh_link
file="flink-core/src/main/java/org/apache/flink/api/connector/source/ReaderOutput.java"
name="ReaderOutput" >}} 上创建并发布局部分片输出。有关详细信息,请参阅该类和方法的 Java 文档。
+
+#### Split Level Watermark Alignment
+
+Although source operator watermark alignment is handled by Flink runtime, the
source needs to additionally implement `SourceReader#pauseOrResumeSplits` and
`SplitReader#pauseOrResumeSplits` to achieve split level watermark alignment.
Split level watermark alignment is useful for when
+there are multiple splits assigned to a source reader. By default, these
implementations will throw an `UnsupportedOperationException`,
`pipeline.watermark-alignment.allow-unaligned-source-splits` is set to false,
when there is more than one split assigned, and the split exceeds the watermark
alignment threshold configured by the `WatermarkStrategy`. `SourceReaderBase`
+contains an implementation for `SourceReader#pauseOrResumeSplits` so that
inheriting sources only need to implement `SplitReader#pauseOrResumeSplits`.
See the javadocs for more implementation hints.