This is an automated email from the ASF dual-hosted git repository.

martijnvisser pushed a commit to branch release-1.17
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.17 by this push:
     new aaf72e45b9b [FLINK-28853][Docs] Sync watermark split alignment 
documentation with the Chinese version. This closes #22724
aaf72e45b9b is described below

commit aaf72e45b9b21096c43165dd0535c359ba7f40c9
Author: Mason Chen <[email protected]>
AuthorDate: Tue Jun 6 23:49:56 2023 -0700

    [FLINK-28853][Docs] Sync watermark split alignment documentation with the 
Chinese version. This closes #22724
---
 .../datastream/event-time/generating_watermarks.md | 22 ++++++++++++----------
 docs/content.zh/docs/dev/datastream/sources.md     |  6 ++++++
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git 
a/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md 
b/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
index f5f13d239f5..5109c5586a6 100644
--- a/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
+++ b/docs/content.zh/docs/dev/datastream/event-time/generating_watermarks.md
@@ -268,19 +268,21 @@ other sources/tasks which can move the combined watermark 
forward and that way u
 one.
 
 {{< hint warning >}}
-**Note:** As of 1.15, Flink supports aligning across tasks of the same source 
and/or different
-sources. It does not support aligning splits/partitions/shards in the same 
task.
+**Note:** As of Flink 1.17, split level watermark alignment is supported by 
the FLIP-27 source framework.
+Source connectors have to implement an interface to resume and pause splits so 
that splits/partitions/shards
+can be aligned in the same task. More detail on the pause and resume 
interfaces can found in the [Source API]({{< ref "docs/dev/datastream/sources" 
>}}#split-level-watermark-alignment).
 
-In a case where there are e.g. two Kafka partitions that produce watermarks at 
different pace, that
-get assigned to the same task watermark might not behave as expected. 
Fortunately, worst case it
-should not perform worse than without alignment.
+If you are upgrading from a Flink version between 1.15.x and 1.16.x inclusive, 
you can disable split level alignment by setting
+`pipeline.watermark-alignment.allow-unaligned-source-splits` to true. 
Moreover, you can tell if your source supports split level alignment
+by checking if it throws an `UnsupportedOperationException` at runtime or by 
reading the javadocs. In this case, it would be desirable to
+to disable split level watermark alignment to avoid fatal exceptions.
 
-Given the limitation above, we suggest applying watermark alignment in two 
situations:
-
-1. You have two different sources (e.g. Kafka and File) that produce 
watermarks at different speeds
-2. You run your source with parallelism equal to the number of 
splits/shards/partitions, which
-   results in every subtask being assigned a single unit of work.
+When setting the flag to true, watermark alignment will be only working 
properly when the number of splits/shards/partitions is equal to the
+parallelism of the source operator. This results in every subtask being 
assigned a single unit of work. On the other hand, if there are two Kafka 
partitions, which produce watermarks at different paces and
+get assigned to the same task, then watermarks might not behave as expected. 
Fortunately, even in the worst case, the basic alignment should not perform 
worse than having no alignment at all.
 
+Furthermore, Flink also supports aligning across tasks of the same sources 
and/or different
+sources, which is useful when you have two different sources (e.g. Kafka and 
File) that produce watermarks at different speeds.
 {{< /hint >}}
 
 <a name="writing-watermarkgenerators"></a>
diff --git a/docs/content.zh/docs/dev/datastream/sources.md 
b/docs/content.zh/docs/dev/datastream/sources.md
index a18d38060e6..a28383ef6fa 100644
--- a/docs/content.zh/docs/dev/datastream/sources.md
+++ b/docs/content.zh/docs/dev/datastream/sources.md
@@ -447,3 +447,9 @@ environment.from_source(
 使用 *SplitReader API* 实现源连接器时,将自动进行处理。所有基于 SplitReader API 
的实现都具有开箱即用(out-of-the-box)的分片水印。
 
 为了保证更底层的 `SourceReader` API 
可以使用每个分片的水印生成,必须将不同分片的事件输送到不同的输出(outputs)中:*局部分片(Split-local) SourceOutputs*。通过 
`createOutputForSplit(splitId)` 和 `releaseOutputForSplit(splitId)` 方法,可以在总 {{< 
gh_link 
file="flink-core/src/main/java/org/apache/flink/api/connector/source/ReaderOutput.java"
 name="ReaderOutput" >}} 上创建并发布局部分片输出。有关详细信息,请参阅该类和方法的 Java 文档。
+
+#### Split Level Watermark Alignment
+
+Although source operator watermark alignment is handled by Flink runtime, the 
source needs to additionally implement `SourceReader#pauseOrResumeSplits` and 
`SplitReader#pauseOrResumeSplits` to achieve split level watermark alignment. 
Split level watermark alignment is useful for when
+there are multiple splits assigned to a source reader. By default, these 
implementations will throw an `UnsupportedOperationException`, 
`pipeline.watermark-alignment.allow-unaligned-source-splits` is set to false, 
when there is more than one split assigned, and the split exceeds the watermark 
alignment threshold configured by the `WatermarkStrategy`. `SourceReaderBase`
+contains an implementation for `SourceReader#pauseOrResumeSplits` so that 
inheriting sources only need to implement `SplitReader#pauseOrResumeSplits`. 
See the javadocs for more implementation hints.

Reply via email to