And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124

On Wed, Oct 26, 2016 at 4:56 PM, Tathagata Das <t...@databricks.com> wrote:

> Hey all,
>
> We are planning implement watermarking in Structured Streaming that would
> allow us handle late, out-of-order data better. Specially, when we are
> aggregating over windows on event-time, we currently can end up keeping
> unbounded amount data as state. We want to define watermarks on the event
> time in order mark and drop data that are "too late" and accordingly age
> out old aggregates that will not be updated any more.
>
> To enable the user to specify details like lateness threshold, we are
> considering adding a new method to Dataset. We would like to get more
> feedback on this API. Here is the design doc
>
> https://docs.google.com/document/d/1z-Pazs5v4rA31azvmYhu4I5x
> wqaNQl6ZLIS03xhkfCQ/
>
> Please comment on the design and proposed APIs.
>
> Thank you very much!
>
> TD
>

Reply via email to