On Wed, Jul 26, 2017 at 12:23 PM, Robert Bradshaw <
[email protected]> wrote:

> On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik <[email protected]>
> wrote:
> > Robert, in your case where output is being produced based upon a
> heartbeat,
> > either the watermark on the output went to infinity and all that data
> being
> > produced is droppable at which point the timer becomes droppable
>
> But why are these timers *more* droppable than the ones that were
> scheduled previously (as per the original question)?


Thinking about this some more in the context of DONE, drain, and update. In
the drain case, the output watermark goes to infinity and I believe we
should drop all timers. In the update case, the timers should be preserved
like state from one pipeline to the next. Finally, this should never happen
for the DONE case as this means we fired all timers and processed all data
(which is different from drain since we know we may have dropped timers). I
don't believe we will have a case where a timer watermark can be
independent of its timestamp which effectively holds the output watermark.
If there was a case where the output watermark could be controlled
independently of the timers then I could see that you would still fire
existing timers but prevent scheduling new timers since new timers should
be allowed to be dropped.

> or the
> > output watermark is being held by the scheduling of the next timer and
> > hence the pipeline is not yet done.
>
> Good point. I was thinking more about drain.
>
> > On Tue, Jul 25, 2017 at 5:34 PM, Eugene Kirpichov <
> > [email protected]> wrote:
> >
> >> Yes, and I think in this case the pipeline should never transition to
> DONE.
> >>
> >> On Tue, Jul 25, 2017 at 3:42 PM Robert Bradshaw
> >> <[email protected]>
> >> wrote:
> >>
> >> > I generally agree, but it's unclear what to do with timers that are
> >> > scheduled during the execution of existing timers. (For example, a
> >> > "heartbeat" source may process a timer by emitting an element and
> >> > scheduling a timer for the future. One would never be able to fire
> "all"
> >> > timers. I suppose this isn't unique to processing time timers.)
> >> >
> >> > On Fri, Jul 21, 2017 at 8:23 AM, Kenneth Knowles
> <[email protected]
> >> >
> >> > wrote:
> >> >
> >> > > I think the best answer is "yes" we should fire all timers before
> exit.
> >> > >
> >> > > This is the subject of https://issues.apache.org/
> jira/browse/BEAM-2535
> >> > > which
> >> > > is a fairly significant enhancement to the model. In this proposal,
> >> every
> >> > > timer is treated like an input with a timestamp and that is
> independent
> >> > of
> >> > > the specification of when to deliver the input.
> >> > >
> >> > > Right now, processing time timers have no event time timestamp
> >> associated
> >> > > with them, nor any watermark hold. So the window expires and they
> are
> >> > > dropped as late data eventually. This is correct according to the
> >> current
> >> > > situation, but we should change it.
> >> > >
> >> > > However, I don't think a pipeline should necessarily actually wait
> in
> >> > > processing time. One of the main uses of the unified batch/streaming
> >> > model
> >> > > is to do historical re-processing using the same logic that you used
> >> for
> >> > > real-time processing. So in a historical "batch" query, you want all
> >> the
> >> > > same callbacks, but you should call them as fast as possible.
> >> > Semantically,
> >> > > it is the same as a fast clock / slow computation anyhow.
> >> > >
> >> > > Kenn
> >> > >
> >> > > On Fri, Jul 21, 2017 at 6:38 AM, Shen Li <[email protected]>
> wrote:
> >> > >
> >> > > > If max watermarks arrive at all transforms before some processing
> >> time
> >> > > > timers fire, should the Pipeline wait till all timers fire before
> >> > turning
> >> > > > to DONE state?
> >> > > >
> >> > > > Thanks,
> >> > > > Shen
> >> > > >
> >> > >
> >> >
> >>
>

Reply via email to