Hi, I have been trying to figure out how structured streaming handles window functions efficiently. The portion I understand is that whenever new data arrived, it is grouped by the time and the aggregated data is added to the state. However, unlike operations like sum etc. window functions need the original data and can change when data arrives late. So if I understand correctly, this would mean that we would have to save the original data and rerun on it to calculate the window function every time new data arrives. Is this correct? Are there ways to go around this issue?
Assaf. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.