Hi Jaromir,

deterministic processing with late elements is indeed more difficult than
without them. What you have to do is to send updates to your downstream
operators in case that you see late elements. This can either be an
incremental update or a retraction with the corrected value. It basically
depends on the operation you're performing. So in your example, you could
define an allowed lateness for which you keep the window contents around.
If the element with value 6 arrives within the allowed lateness (and after
the watermark) emit the corrected sum for the window. Your downstream
operators have to be able to handle updated windows, though. At the moment
this is something the user has to take care of and there is only little
support from Flink's side. But we're working on improving the late element
handling with [1].

[1] https://github.com/apache/flink/pull/2415

Cheers,
Till

On Fri, Nov 4, 2016 at 11:38 PM, Jaromir Vanek <vanek.jaro...@gmail.com>
wrote:

> Is Flink processing really repeatedly deterministic when incoming stream of
> elements is out-of-order? How is it ensured?
>
> I am aware of all the principles like event time and watermarking. But I
> can't understand how it works in case there are late elements in stream -
> that means there are elements violating the watermark condition - having
> lower timestamp than previously emitted watermark. From my point of view
> these elements will flow through the system without any mechanism that
> would
> discard them. Late elements then may, or may not, fall into existing
> windows.
>
> Let's draw simple example reading from Kafka source with two partitions.
> Numbers are representing event time. Data from Kafka are shuffled to the
> one
> WindowOperator calculating sum (all elements have the same key).
>
>
> ---------------------------------------|
> part. 1   | ..., 15, 12, 9  |  ------->|
> |
>                                                 |          WindowOperator
> |
> part. 2   | ..., 18, 6, 11  |  ------->|    (window maxTimestamp 10)    |
>
> -----------------------------------------
>
> Elements can arrive to WindowOperator in arbitrary order
>
> example1 (E denotes element, W denotes watermark)
>
> E 9
> W 9
> E 11
> W 11 (current watermark: min(9, 11) = 9)
> E 6
> E 12
> W 12 (current watermark: min(9, 12) = 12)
> ---------------
> window(0, 10) fires with sum 15
>
> example2:
>
> E 9
> W 9
> E 11
> W 11 (current watermark: min(9, 11) = 9)
> E 12
> W 12 (current watermark: min(9, 12) = 12)
> ---------------
> window fires with sum 9
>
>
> In my example result is not deterministic, it's more less random. Is there
> anything I am missing?
>
> Thank you very much for explanation.
>
>
>
> --
> View this message in context: http://apache-flink-mailing-
> list-archive.1008284.n3.nabble.com/Deterministic-
> processing-with-out-of-order-streams-tp14409.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>

Reply via email to