Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

Robert Bradshaw via dev Tue, 27 Feb 2024 10:50:23 -0800

On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> On 2/27/24 19:22, Robert Bradshaw via dev wrote:
> > On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles <k...@apache.org> wrote:
> >> Pulling out focus points:
> >>
> >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev 
> >> <dev@beam.apache.org> wrote:
> >>> I can't act on something yet [...] but I expect to be able to [...] at 
> >>> some time in the processing-time future.
> >> I like this as a clear and internally-consistent feature description. It 
> >> describes ProcessContinuation and those timers which serve the same 
> >> purpose as ProcessContinuation.
> >>
> >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev 
> >> <dev@beam.apache.org> wrote:
> >>> I can't think of a batch or streaming scenario where it would be correct 
> >>> to not wait at least that long
> >> The main reason we created timers: to take action in the absence of data. 
> >> The archetypal use case for processing time timers was/is "flush data from 
> >> state if it has been sitting there too long". For this use case, the right 
> >> behavior for batch is to skip the timer. It is actually basically 
> >> incorrect to wait.
> > Good point calling out the distinction between "I need to wait in case
> > there's more data." and "I need to wait for something external." We
> > can't currently distinguish between the two, but a batch runner can
> > say something definitive about the first. Feels like we need a new
> > primitive (or at least new signaling information on our existing
> > primitive).
> Runners signal end of data to a DoFn via (input) watermark. Is there a
> need for additional information?


Yes, and I agree that watermarks/event timestamps are a much better
way to track data completeness (if possible).

Unfortunately processing timers don't specify if they're waiting for
additional data or external/environmental change, meaning we can't use
the (event time) watermark to determine whether they're safe to
trigger.

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

Reply via email to