On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský <je...@seznam.cz> wrote: > > On 2/27/24 19:22, Robert Bradshaw via dev wrote: > > On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles <k...@apache.org> wrote: > >> Pulling out focus points: > >> > >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev > >> <dev@beam.apache.org> wrote: > >>> I can't act on something yet [...] but I expect to be able to [...] at > >>> some time in the processing-time future. > >> I like this as a clear and internally-consistent feature description. It > >> describes ProcessContinuation and those timers which serve the same > >> purpose as ProcessContinuation. > >> > >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev > >> <dev@beam.apache.org> wrote: > >>> I can't think of a batch or streaming scenario where it would be correct > >>> to not wait at least that long > >> The main reason we created timers: to take action in the absence of data. > >> The archetypal use case for processing time timers was/is "flush data from > >> state if it has been sitting there too long". For this use case, the right > >> behavior for batch is to skip the timer. It is actually basically > >> incorrect to wait. > > Good point calling out the distinction between "I need to wait in case > > there's more data." and "I need to wait for something external." We > > can't currently distinguish between the two, but a batch runner can > > say something definitive about the first. Feels like we need a new > > primitive (or at least new signaling information on our existing > > primitive). > Runners signal end of data to a DoFn via (input) watermark. Is there a > need for additional information?
Yes, and I agree that watermarks/event timestamps are a much better way to track data completeness (if possible). Unfortunately processing timers don't specify if they're waiting for additional data or external/environmental change, meaning we can't use the (event time) watermark to determine whether they're safe to trigger.