[ 
https://issues.apache.org/jira/browse/SPARK-47698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingeshwaran Radhakrishnan updated SPARK-47698:
-----------------------------------------------
    Description: 
[This 
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
 of the doc which explains the nuances of handling late records using beautiful 
diagrams has gone out of context and a bit misleading after *multiple stateful 
operators'* support was introduced with 
https://issues.apache.org/jira/browse/SPARK-40925 

!image-2024-04-02-15-04-34-287.png!

 

Previously watermark is applied for batch N to filter out inputs in batch N. 
With support for multiple stateful operators, the watermark is applied for 
batch N-1 instead. The doc section above should reflect this new behavior to 
avoid confusion.

  was:
[This 
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
 of the doc which explains the nuances of handling late records using beautiful 
diagrams has gone out of context and a bit misleading after *multiple stateful 
operators'* support was introduced with 
https://issues.apache.org/jira/browse/SPARK-40925 

!image-2024-04-02-15-01-25-523.png!

 

Previously watermark is applied for batch N to filter out inputs in batch N. 
With support for multiple stateful operators, the watermark is applied for 
batch N-1 instead. The doc section above should reflect this new behavior to 
avoid confusion.


> Current doc section handling-late-data-and-watermarking is misleading after 
> the support for multiple stateful operators
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-47698
>                 URL: https://issues.apache.org/jira/browse/SPARK-47698
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 3.4.0
>            Reporter: Lingeshwaran Radhakrishnan
>            Priority: Minor
>         Attachments: image-2024-04-02-15-04-34-287.png
>
>
> [This 
> section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
>  of the doc which explains the nuances of handling late records using 
> beautiful diagrams has gone out of context and a bit misleading after 
> *multiple stateful operators'* support was introduced with 
> https://issues.apache.org/jira/browse/SPARK-40925 
> !image-2024-04-02-15-04-34-287.png!
>  
> Previously watermark is applied for batch N to filter out inputs in batch N. 
> With support for multiple stateful operators, the watermark is applied for 
> batch N-1 instead. The doc section above should reflect this new behavior to 
> avoid confusion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to