Lingeshwaran Radhakrishnan created SPARK-47698:
--------------------------------------------------
Summary: Current doc section handling-late-data-and-watermarking
is misleading after the support for multiple stateful operators
Key: SPARK-47698
URL: https://issues.apache.org/jira/browse/SPARK-47698
Project: Spark
Issue Type: Documentation
Components: Documentation
Affects Versions: 3.4.0
Reporter: Lingeshwaran Radhakrishnan
Attachments: image-2024-04-02-15-04-34-287.png
[This
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
of the doc which explains the nuances of handling late records using beautiful
diagrams has gone out of context and a bit misleading after *multiple stateful
operators'* support was introduced with
https://issues.apache.org/jira/browse/SPARK-40925
!image-2024-04-02-15-01-25-523.png!
Previously watermark is applied for batch N to filter out inputs in batch N.
With support for multiple stateful operators, the watermark is applied for
batch N-1 instead. The doc section above should reflect this new behavior to
avoid confusion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]