This is an automated email from the ASF dual-hosted git repository.
kabhwan pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new a3d23fdb775b [MINOR][SS] Minor update to watermark propagation comments
a3d23fdb775b is described below
commit a3d23fdb775bee3f03c52a77b80bc0c724108e20
Author: Neil Ramaswamy <[email protected]>
AuthorDate: Wed Dec 18 15:45:36 2024 +0900
[MINOR][SS] Minor update to watermark propagation comments
### What changes were proposed in this pull request?
A few minor changes to clarify (and fix one typo) in the comments for
watermark propagation in Structured Streaming.
### Why are the changes needed?
I found some of the terminology around "simulation" confusing, and the
current comment describes incorrect logic for output watermark calculation.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #49188 from neilramaswamy/nr/minor-wm-prop.
Authored-by: Neil Ramaswamy <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit 2b41131d7fa66ef5b23fbe247e057d631ee5e4f6)
Signed-off-by: Jungtaek Lim <[email protected]>
---
.../spark/sql/execution/streaming/WatermarkPropagator.scala | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala
index 6f3725bebb9a..3d9325f9c98c 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala
@@ -124,12 +124,14 @@ class UseSingleWatermarkPropagator extends
WatermarkPropagator {
/**
* This implementation simulates propagation of watermark among operators.
*
- * The simulation algorithm traverses the physical plan tree via post-order
(children first) to
- * calculate (input watermark, output watermark) for all nodes.
+ * It is considered a "simulation" because watermarks are not being physically
sent between
+ * operators, but rather propagated up the tree via post-order (children
first) traversal of
+ * the query plan. This allows Structured Streaming to determine the new
(input watermark, output
+ * watermark) for all nodes.
*
* For each node, below logic is applied:
*
- * - Input watermark for specific node is decided by `min(input watermarks
from all children)`.
+ * - Input watermark for specific node is decided by `min(output watermarks
from all children)`.
* -- Children providing no input watermark (DEFAULT_WATERMARK_MS) are
excluded.
* -- If there is no valid input watermark from children, input watermark =
DEFAULT_WATERMARK_MS.
* - Output watermark for specific node is decided as following:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]