On Thu, Aug 3, 2017 at 7:05 AM, Lukasz Cwik <[email protected]>
wrote:

> On Wed, Jul 26, 2017 at 12:23 PM, Robert Bradshaw <
> [email protected]> wrote:
>
> > On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik <[email protected]>
> > wrote:
> > > Robert, in your case where output is being produced based upon a
> > heartbeat,
> > > either the watermark on the output went to infinity and all that data
> > being
> > > produced is droppable at which point the timer becomes droppable
> >
> > But why are these timers *more* droppable than the ones that were
> > scheduled previously (as per the original question)?
>
>
> Thinking about this some more in the context of DONE, drain, and update. In
> the drain case, the output watermark goes to infinity and I believe we
> should drop all timers. In the update case, the timers should be preserved
> like state from one pipeline to the next. Finally, this should never happen
> for the DONE case as this means we fired all timers and processed all data
> (which is different from drain since we know we may have dropped timers). I
> don't believe we will have a case where a timer watermark can be
> independent of its timestamp which effectively holds the output watermark.
> If there was a case where the output watermark could be controlled
> independently of the timers then I could see that you would still fire
> existing timers but prevent scheduling new timers since new timers should
> be allowed to be dropped.
>
>
In the details of https://issues.apache.org/jira/browse/BEAM-2535 is the
fact that the timer loopback input channel needs a watermark separate from
the element input channel. The timestamp associated with the timer
(independent of its delivery logic) holds the watermark on that channel,
which automatically holds the output watermark. So it won't prevent new
timers from being set (I went through the same logic in my head, too).

If I read this discussion correctly, normal termination and update "just
work", and it seems like the most unresolved bit is whether or not we want
users to write loops equivalent to "while(true) { ... }" that we break when
we issue a drain command. The semantics are clearer if we just treat these
as infinite loops, but then naive uses of timers create undrainable
pipelines. But if drain can drop timers, making those loops drainable, it
seems like a situation the user has not considered, so possibly resulting
in data loss.

Kenn

Reply via email to