Lingeshwaran Radhakrishnan created SPARK-47698:
--------------------------------------------------

             Summary: Current doc section handling-late-data-and-watermarking 
is misleading after the support for multiple stateful operators
                 Key: SPARK-47698
                 URL: https://issues.apache.org/jira/browse/SPARK-47698
             Project: Spark
          Issue Type: Documentation
          Components: Documentation
    Affects Versions: 3.4.0
            Reporter: Lingeshwaran Radhakrishnan
         Attachments: image-2024-04-02-15-04-34-287.png

[This 
section|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking]
 of the doc which explains the nuances of handling late records using beautiful 
diagrams has gone out of context and a bit misleading after *multiple stateful 
operators'* support was introduced with 
https://issues.apache.org/jira/browse/SPARK-40925 

!image-2024-04-02-15-01-25-523.png!

 

Previously watermark is applied for batch N to filter out inputs in batch N. 
With support for multiple stateful operators, the watermark is applied for 
batch N-1 instead. The doc section above should reflect this new behavior to 
avoid confusion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to