On Wed, May 5, 2021 at 9:55 AM Reuven Lax <re...@google.com> wrote: > > > On Wed, May 5, 2021 at 9:36 AM Jan Lukavský <je...@seznam.cz> wrote: > >> Yes, what I meant was the distinguishing of cleanup timers from plain >> user other timers - that seems to be due to the fact that they fire base on >> different watermark (input/output). >> > They don't though, because AFAIK no runner has the capability to fire > timers based on output watermarks. Beam relies on timer ordering to fire > cleanup timers, which has a somewhat similar effect - by the time the > cleanup timer fires, all other timers are guaranteed to have fired. >
Again, cleanup timers are not part of the Beam model. Beam doesn't rely on ordering of timers. Beam doesn't even rely on the _existence_ of cleanup timers. Beam doesn't even rely on the existence of _cleanup_. Kenn > > >> And firing timers based on output watermark might be actually a good >> user-facing feature, because that might help tracking output watermark in >> transforms that want to deal with potentially droppable data downstream >> (the input would have to be re-windowed to global window, of course). I >> don't know if there are other use-cases, if not maybe it might be >> sufficient to create a DroppableDataSplit transform, that would create a >> PCollectionTuple with droppable and other data. But that was just an idea >> when Kenn mentioned that the cleanup timers "fire differently" - I >> generally think that when there is a need for a different behavior, than it >> might signal there is something possibly fundamental. >> > > We provide OnWindowExpiration, which is basically this. It fires after all > other windows and timers, but just before the cleanup happens (in fact we > currently piggy back on the cleanup timer to implement OnWindowExpiration). > >> Jan >> On 5/5/21 5:11 PM, Reuven Lax wrote: >> >> This is one way to put it. I think in practice Beam guarantees that >> timers fire in order for a given key (though there is still a bit of a bug >> around looping timers - the fix for that got rolled back). This means that >> as long as the runner sets the cleanup timer to be 1ms passed the end of >> the window (plus allowed lateness), it's guaranteed to be the last timer >> that fires for that window. >> >> >> >> On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <je...@seznam.cz> wrote: >> >>> Hm, one thing in the last paragraph seems there might be some logical >>> gap. >>> >>> > For a runner author to implement a "cleanup timer" requires a >>> different mechanism. A window expires when *both* the input element >>> watermark *and* the timer watermark are past the expiry time. In other >>> words, the cleanup timer fires according to the minimum of these >>> watermarks, combined. It *cannot* fire according to the input element >>> watermark. If you naively try to implement it as a user timer, it will be >>> incorrect. Incidentally this is why @OnWindowExpiration is a meaningful >>> feature. >>> The description describes a timer that fires not according to input >>> watermark, but according to the output watermark (once the output watermark >>> reaches certain point in time). That logically implies, that such a timer >>> cannot have non-droppable output (at least if its output timestamp belongs >>> to the respective window) and cannot create a watermark hold (because that >>> would block the progress of the output watermark and might cause the timer >>> to not fire ever). This maybe might be a useful user-feature as well, >>> probably again mostly related to how user-code might want to deal with >>> droppable data. >>> >>> Jan >>> >>> On 5/4/21 6:41 PM, Kenneth Knowles wrote: >>> >>> Mean to also add +Reuven Lax <re...@google.com> >>> >>> On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> Explicitly pinging a couple folks who were involved in the original >>>> change which yours essentially reverts. There's a model question here that >>>> I want to clarify on-list: >>>> >>>> When you have a ParDo setting timers, you have an additional watermark >>>> that must be considered: >>>> >>>> - input element watermark >>>> - output watermark >>>> - *(user) timer watermark* >>>> >>>> The timer watermark is an input to the ParDo. Sometimes you might think >>>> of the "timer channel" as a self loop, where each timer is an element. Each >>>> timer has a timestamp (the output timestamp) and separately some >>>> instructions on when to deliver that timer. This is the same as the usual >>>> difference between event time and processing time. >>>> >>>> The instruction on when to deliver a timer can have two forms: >>>> >>>> - wait a certain amount of processing time >>>> - deliver the timer when the *input element watermark* reaches a time X >>>> >>>> Here is an important point: "cleanup timers" are *not* user timers. >>>> They are an implementation detail. They are not part of the model. The >>>> runner's job is to reclaim resources as windows expire. A user should never >>>> be reasoning about how their timers relate to cleanup timers (except for >>>> resource consumption). Because there is no relationship except that the >>>> cleanup should happen "eventually" and invisibly. >>>> >>>> For a runner author to implement a "cleanup timer" requires a different >>>> mechanism. A window expires when *both* the input element watermark *and* >>>> the timer watermark are past the expiry time. In other words, the cleanup >>>> timer fires according to the minimum of these watermarks, combined. It >>>> *cannot* fire according to the input element watermark. If you naively try >>>> to implement it as a user timer, it will be incorrect. Incidentally this is >>>> why @OnWindowExpiration is a meaningful feature. >>>> >>>> Kenn >>>> >>>> On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz> wrote: >>>> >>>>> Hi Kenn, >>>>> >>>>> I created BEAM-12276 [1] with PR [2]. >>>>> >>>>> Jan >>>>> >>>>> [1] https://issues.apache.org/jira/browse/BEAM-12276 >>>>> >>>>> [2] https://github.com/apache/beam/pull/14718 >>>>> On 5/3/21 7:46 PM, Kenneth Knowles wrote: >>>>> >>>>> This seems like just a bug. If you set a timer for X and have output >>>>> timestamp Y where X < Y this should be fine. Is the problem the current >>>>> input watermark? Are you trying to set a timer with output timestamp that >>>>> is already past? I think that should be allowed, too, as long as the >>>>> window >>>>> is not expired, but I may be missing something. >>>>> >>>>> Some greater detail would be useful - maybe the full stack trace >>>>> and/or a failing unit test in a PR? >>>>> >>>>> Kenn >>>>> >>>>> On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský <je...@seznam.cz> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have come across a bug with timer output timestamp - when using >>>>>> event >>>>>> time and relative timers, setting the timer can arbitrarily throw >>>>>> IllegalArgumentException if the firing timestamp (input watermark) is >>>>>> ahead of the output timestamp (like >>>>>> .java.lang.IllegalArgumentException: >>>>>> Attempted to set an event-time timer with an output timestamp of >>>>>> 2021-04-29T07:16:19.369Z that is after the timer firing timestamp >>>>>> -290308-12-21T19:59:05.225Z). But there is no way to access the >>>>>> firing >>>>>> timestamp from user code. This means that the use has to either catch >>>>>> the IllegalArgumentException, or not use this construct at all. >>>>>> >>>>>> Catching the exception should probably not be part of a contract, so >>>>>> we >>>>>> should do one of the following: >>>>>> >>>>>> a) either throw the exception right away and disable using relative >>>>>> timers with output timestamp completely, or >>>>>> >>>>>> b) support it correctly >>>>>> >>>>>> What is the actual reason not to support output timestamps, that are >>>>>> ahead of firing timestamp? From my understanding, that should not be >>>>>> an >>>>>> issue, because there is TimestampCombiner.EARLIEST on the >>>>>> watermarkholdstate that corresponds to the output timestamp. If that >>>>>> is >>>>>> correct can we simply remove the check? >>>>>> >>>>>> Jan >>>>>> >>>>>>