And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124
On Wed, Oct 26, 2016 at 4:56 PM, Tathagata Das <t...@databricks.com> wrote: > Hey all, > > We are planning implement watermarking in Structured Streaming that would > allow us handle late, out-of-order data better. Specially, when we are > aggregating over windows on event-time, we currently can end up keeping > unbounded amount data as state. We want to define watermarks on the event > time in order mark and drop data that are "too late" and accordingly age > out old aggregates that will not be updated any more. > > To enable the user to specify details like lateness threshold, we are > considering adding a new method to Dataset. We would like to get more > feedback on this API. Here is the design doc > > https://docs.google.com/document/d/1z-Pazs5v4rA31azvmYhu4I5x > wqaNQl6ZLIS03xhkfCQ/ > > Please comment on the design and proposed APIs. > > Thank you very much! > > TD >