[
https://issues.apache.org/jira/browse/SPARK-47698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lingeshwaran Radhakrishnan updated SPARK-47698:
-----------------------------------------------
Description:
[This
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
of the doc which explains the nuances of handling late records using beautiful
diagrams has gone out of context and a bit misleading after *multiple stateful
operators'* support was introduced with
https://issues.apache.org/jira/browse/SPARK-40925
!image-2024-04-02-15-04-34-287.png!
Previously watermark is applied for batch N to filter out inputs in batch N.
With support for multiple stateful operators, the watermark is applied for
batch N-1 instead. The doc section above should reflect this new behavior to
avoid confusion.
was:
[This
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
of the doc which explains the nuances of handling late records using beautiful
diagrams has gone out of context and a bit misleading after *multiple stateful
operators'* support was introduced with
https://issues.apache.org/jira/browse/SPARK-40925
!image-2024-04-02-15-01-25-523.png!
Previously watermark is applied for batch N to filter out inputs in batch N.
With support for multiple stateful operators, the watermark is applied for
batch N-1 instead. The doc section above should reflect this new behavior to
avoid confusion.
> Current doc section handling-late-data-and-watermarking is misleading after
> the support for multiple stateful operators
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-47698
> URL: https://issues.apache.org/jira/browse/SPARK-47698
> Project: Spark
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 3.4.0
> Reporter: Lingeshwaran Radhakrishnan
> Priority: Minor
> Attachments: image-2024-04-02-15-04-34-287.png
>
>
> [This
> section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
> of the doc which explains the nuances of handling late records using
> beautiful diagrams has gone out of context and a bit misleading after
> *multiple stateful operators'* support was introduced with
> https://issues.apache.org/jira/browse/SPARK-40925
> !image-2024-04-02-15-04-34-287.png!
>
> Previously watermark is applied for batch N to filter out inputs in batch N.
> With support for multiple stateful operators, the watermark is applied for
> batch N-1 instead. The doc section above should reflect this new behavior to
> avoid confusion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]