Hi,
I have been trying to figure out how structured streaming handles window 
functions efficiently.
The portion I understand is that whenever new data arrived, it is grouped by 
the time and the aggregated data is added to the state.
However, unlike operations like sum etc. window functions need the original 
data and can change when data arrives late.
So if I understand correctly, this would mean that we would have to save the 
original data and rerun on it to calculate the window function every time new 
data arrives.
Is this correct? Are there ways to go around this issue?

Assaf.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to