Re: Should Pipeline wait till all processing time timers fire before exit?

Robert Bradshaw Wed, 26 Jul 2017 12:24:38 -0700

On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik <lc...@google.com.invalid> wrote:
> Robert, in your case where output is being produced based upon a heartbeat,
> either the watermark on the output went to infinity and all that data being
> produced is droppable at which point the timer becomes droppable


But why are these timers *more* droppable than the ones that were
scheduled previously (as per the original question)?

> or the
> output watermark is being held by the scheduling of the next timer and
> hence the pipeline is not yet done.

Good point. I was thinking more about drain.

> On Tue, Jul 25, 2017 at 5:34 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
>> Yes, and I think in this case the pipeline should never transition to DONE.
>>
>> On Tue, Jul 25, 2017 at 3:42 PM Robert Bradshaw
>> <rober...@google.com.invalid>
>> wrote:
>>
>> > I generally agree, but it's unclear what to do with timers that are
>> > scheduled during the execution of existing timers. (For example, a
>> > "heartbeat" source may process a timer by emitting an element and
>> > scheduling a timer for the future. One would never be able to fire "all"
>> > timers. I suppose this isn't unique to processing time timers.)
>> >
>> > On Fri, Jul 21, 2017 at 8:23 AM, Kenneth Knowles <k...@google.com.invalid
>> >
>> > wrote:
>> >
>> > > I think the best answer is "yes" we should fire all timers before exit.
>> > >
>> > > This is the subject of https://issues.apache.org/jira/browse/BEAM-2535
>> > > which
>> > > is a fairly significant enhancement to the model. In this proposal,
>> every
>> > > timer is treated like an input with a timestamp and that is independent
>> > of
>> > > the specification of when to deliver the input.
>> > >
>> > > Right now, processing time timers have no event time timestamp
>> associated
>> > > with them, nor any watermark hold. So the window expires and they are
>> > > dropped as late data eventually. This is correct according to the
>> current
>> > > situation, but we should change it.
>> > >
>> > > However, I don't think a pipeline should necessarily actually wait in
>> > > processing time. One of the main uses of the unified batch/streaming
>> > model
>> > > is to do historical re-processing using the same logic that you used
>> for
>> > > real-time processing. So in a historical "batch" query, you want all
>> the
>> > > same callbacks, but you should call them as fast as possible.
>> > Semantically,
>> > > it is the same as a fast clock / slow computation anyhow.
>> > >
>> > > Kenn
>> > >
>> > > On Fri, Jul 21, 2017 at 6:38 AM, Shen Li <cs.she...@gmail.com> wrote:
>> > >
>> > > > If max watermarks arrive at all transforms before some processing
>> time
>> > > > timers fire, should the Pipeline wait till all timers fire before
>> > turning
>> > > > to DONE state?
>> > > >
>> > > > Thanks,
>> > > > Shen
>> > > >
>> > >
>> >
>>

Re: Should Pipeline wait till all processing time timers fire before exit?

Reply via email to