On Wed, May 5, 2021 at 9:55 AM Reuven Lax <re...@google.com> wrote:

>
>
> On Wed, May 5, 2021 at 9:36 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Yes, what I meant was the distinguishing of cleanup timers from plain
>> user other timers - that seems to be due to the fact that they fire base on
>> different watermark (input/output).
>>
> They don't though, because AFAIK no runner has the capability to fire
> timers based on output watermarks. Beam relies on timer ordering to fire
> cleanup timers, which has a somewhat similar effect - by the time the
> cleanup timer fires, all other timers are guaranteed to have fired.
>

Again, cleanup timers are not part of the Beam model. Beam doesn't rely on
ordering of timers. Beam doesn't even rely on the _existence_ of cleanup
timers. Beam doesn't even rely on the existence of _cleanup_.

Kenn


>
>
>> And firing timers based on output watermark might be actually a good
>> user-facing feature, because that might help tracking output watermark in
>> transforms that want to deal with potentially droppable data downstream
>> (the input would have to be re-windowed to global window, of course). I
>> don't know if there are other use-cases, if not maybe it might be
>> sufficient to create a DroppableDataSplit transform, that would create a
>> PCollectionTuple with droppable and other data. But that was just an idea
>> when Kenn mentioned that the cleanup timers "fire differently" - I
>> generally think that when there is a need for a different behavior, than it
>> might signal there is something possibly fundamental.
>>
>
> We provide OnWindowExpiration, which is basically this. It fires after all
> other windows and timers, but just before the cleanup happens (in fact we
> currently piggy back on the cleanup timer to implement OnWindowExpiration).
>
>>  Jan
>> On 5/5/21 5:11 PM, Reuven Lax wrote:
>>
>> This is one way to put it. I think in practice Beam guarantees that
>> timers fire in order for a given key (though there is still a bit of a bug
>> around looping timers - the fix for that got rolled back). This means that
>> as long as the runner sets the cleanup timer to be 1ms passed the end of
>> the window (plus allowed lateness), it's guaranteed to be the last timer
>> that fires for that window.
>>
>>
>>
>> On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hm, one thing in the last paragraph seems there might be some logical
>>> gap.
>>>
>>> > For a runner author to implement a "cleanup timer" requires a
>>> different mechanism. A window expires when *both* the input element
>>> watermark *and* the timer watermark are past the expiry time. In other
>>> words, the cleanup timer fires according to the minimum of these
>>> watermarks, combined. It *cannot* fire according to the input element
>>> watermark. If you naively try to implement it as a user timer, it will be
>>> incorrect. Incidentally this is why @OnWindowExpiration is a meaningful
>>> feature.
>>> The description describes a timer that fires not according to input
>>> watermark, but according to the output watermark (once the output watermark
>>> reaches certain point in time). That logically implies, that such a timer
>>> cannot have non-droppable output (at least if its output timestamp belongs
>>> to the respective window) and cannot create a watermark hold (because that
>>> would block the progress of the output watermark and might cause the timer
>>> to not fire ever). This maybe might be a useful user-feature as well,
>>> probably again mostly related to how user-code might want to deal with
>>> droppable data.
>>>
>>>  Jan
>>>
>>> On 5/4/21 6:41 PM, Kenneth Knowles wrote:
>>>
>>> Mean to also add +Reuven Lax <re...@google.com>
>>>
>>> On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> Explicitly pinging a couple folks who were involved in the original
>>>> change which yours essentially reverts. There's a model question here that
>>>> I want to clarify on-list:
>>>>
>>>> When you have a ParDo setting timers, you have an additional watermark
>>>> that must be considered:
>>>>
>>>>  - input element watermark
>>>>  - output watermark
>>>>  - *(user) timer watermark*
>>>>
>>>> The timer watermark is an input to the ParDo. Sometimes you might think
>>>> of the "timer channel" as a self loop, where each timer is an element. Each
>>>> timer has a timestamp (the output timestamp) and separately some
>>>> instructions on when to deliver that timer. This is the same as the usual
>>>> difference between event time and processing time.
>>>>
>>>> The instruction on when to deliver a timer can have two forms:
>>>>
>>>>  - wait a certain amount of processing time
>>>>  - deliver the timer when the *input element watermark* reaches a time X
>>>>
>>>> Here is an important point: "cleanup timers" are *not* user timers.
>>>> They are an implementation detail. They are not part of the model. The
>>>> runner's job is to reclaim resources as windows expire. A user should never
>>>> be reasoning about how their timers relate to cleanup timers (except for
>>>> resource consumption). Because there is no relationship except that the
>>>> cleanup should happen "eventually" and invisibly.
>>>>
>>>> For a runner author to implement a "cleanup timer" requires a different
>>>> mechanism. A window expires when *both* the input element watermark *and*
>>>> the timer watermark are past the expiry time. In other words, the cleanup
>>>> timer fires according to the minimum of these watermarks, combined. It
>>>> *cannot* fire according to the input element watermark. If you naively try
>>>> to implement it as a user timer, it will be incorrect. Incidentally this is
>>>> why @OnWindowExpiration is a meaningful feature.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> Hi Kenn,
>>>>>
>>>>> I created BEAM-12276 [1] with PR [2].
>>>>>
>>>>>  Jan
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/BEAM-12276
>>>>>
>>>>> [2] https://github.com/apache/beam/pull/14718
>>>>> On 5/3/21 7:46 PM, Kenneth Knowles wrote:
>>>>>
>>>>> This seems like just a bug. If you set a timer for X and have output
>>>>> timestamp Y where X < Y this should be fine. Is the problem the current
>>>>> input watermark? Are you trying to set a timer with output timestamp that
>>>>> is already past? I think that should be allowed, too, as long as the 
>>>>> window
>>>>> is not expired, but I may be missing something.
>>>>>
>>>>> Some greater detail would be useful - maybe the full stack trace
>>>>> and/or a failing unit test in a PR?
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have come across a bug with timer output timestamp - when using
>>>>>> event
>>>>>> time and relative timers, setting the timer can arbitrarily throw
>>>>>> IllegalArgumentException if the firing timestamp (input watermark) is
>>>>>> ahead of the output timestamp (like
>>>>>> .java.lang.IllegalArgumentException:
>>>>>> Attempted to set an event-time timer with an output timestamp of
>>>>>> 2021-04-29T07:16:19.369Z that is after the timer firing timestamp
>>>>>> -290308-12-21T19:59:05.225Z). But there is no way to access the
>>>>>> firing
>>>>>> timestamp from user code. This means that the use has to either catch
>>>>>> the IllegalArgumentException, or not use this construct at all.
>>>>>>
>>>>>> Catching the exception should probably not be part of a contract, so
>>>>>> we
>>>>>> should do one of the following:
>>>>>>
>>>>>>   a) either throw the exception right away and disable using relative
>>>>>> timers with output timestamp completely, or
>>>>>>
>>>>>>   b) support it correctly
>>>>>>
>>>>>> What is the actual reason not to support output timestamps, that are
>>>>>> ahead of firing timestamp? From my understanding, that should not be
>>>>>> an
>>>>>> issue, because there is TimestampCombiner.EARLIEST on the
>>>>>> watermarkholdstate that corresponds to the output timestamp. If that
>>>>>> is
>>>>>> correct can we simply remove the check?
>>>>>>
>>>>>>   Jan
>>>>>>
>>>>>>

Reply via email to