Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

Jan Lukavský Wed, 05 May 2021 02:41:31 -0700

Hm, one thing in the last paragraph seems there might be some logical gap.

> For a runner author to implement a "cleanup timer" requires adifferent mechanism. A window expires when *both* the input elementwatermark *and* the timer watermark are past the expiry time. In otherwords, the cleanup timer fires according to the minimum of thesewatermarks, combined. It *cannot* fire according to the input elementwatermark. If you naively try to implement it as a user timer, it willbe incorrect. Incidentally this is why @OnWindowExpiration is ameaningful feature.

The description describes a timer that fires not according to inputwatermark, but according to the output watermark (once the outputwatermark reaches certain point in time). That logically implies, thatsuch a timer cannot have non-droppable output (at least if its outputtimestamp belongs to the respective window) and cannot create awatermark hold (because that would block the progress of the outputwatermark and might cause the timer to not fire ever). This maybe mightbe a useful user-feature as well, probably again mostly related to howuser-code might want to deal with droppable data.


 Jan

On 5/4/21 6:41 PM, Kenneth Knowles wrote:

Mean to also add +Reuven Lax <mailto:[email protected]>

On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <[email protected]<mailto:[email protected]>> wrote:


    Explicitly pinging a couple folks who were involved in the
    original change which yours essentially reverts. There's a model
    question here that I want to clarify on-list:

    When you have a ParDo setting timers, you have an additional
    watermark that must be considered:

     - input element watermark
     - output watermark
     - *(user) timer watermark*

    The timer watermark is an input to the ParDo. Sometimes you might
    think of the "timer channel" as a self loop, where each timer is
    an element. Each timer has a timestamp (the output timestamp) and
    separately some instructions on when to deliver that timer. This
    is the same as the usual difference between event time and
    processing time.

    The instruction on when to deliver a timer can have two forms:

     - wait a certain amount of processing time
     - deliver the timer when the *input element watermark* reaches a
    time X

    Here is an important point: "cleanup timers" are *not* user
    timers. They are an implementation detail. They are not part of
    the model. The runner's job is to reclaim resources as windows
    expire. A user should never be reasoning about how their timers
    relate to cleanup timers (except for resource consumption).
    Because there is no relationship except that the cleanup should
    happen "eventually" and invisibly.

    For a runner author to implement a "cleanup timer" requires a
    different mechanism. A window expires when *both* the input
    element watermark *and* the timer watermark are past the expiry
    time. In other words, the cleanup timer fires according to the
    minimum of these watermarks, combined. It *cannot* fire according
    to the input element watermark. If you naively try to implement it
    as a user timer, it will be incorrect. Incidentally this is
    why @OnWindowExpiration is a meaningful feature.

    Kenn

    On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Kenn,

        I created BEAM-12276 [1] with PR [2].

         Jan

        [1] https://issues.apache.org/jira/browse/BEAM-12276
        <https://issues.apache.org/jira/browse/BEAM-12276>

        [2] https://github.com/apache/beam/pull/14718
        <https://github.com/apache/beam/pull/14718>

        On 5/3/21 7:46 PM, Kenneth Knowles wrote:

        This seems like just a bug. If you set a timer for X and have
        output timestamp Y where X < Y this should be fine. Is the
        problem the current input watermark? Are you trying to set a
        timer with output timestamp that is already past? I think
        that should be allowed, too, as long as the window is not
        expired, but I may be missing something.

        Some greater detail would be useful - maybe the full stack
        trace and/or a failing unit test in a PR?

        Kenn

        On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,

            I have come across a bug with timer output timestamp -
            when using event
            time and relative timers, setting the timer can
            arbitrarily throw
            IllegalArgumentException if the firing timestamp (input
            watermark) is
            ahead of the output timestamp (like
            .java.lang.IllegalArgumentException:
            Attempted to set an event-time timer with an output
            timestamp of
            2021-04-29T07:16:19.369Z that is after the timer firing
            timestamp
            -290308-12-21T19:59:05.225Z). But there is no way to
            access the firing
            timestamp from user code. This means that the use has to
            either catch
            the IllegalArgumentException, or not use this construct
            at all.

            Catching the exception should probably not be part of a
            contract, so we
            should do one of the following:

              a) either throw the exception right away and disable
            using relative
            timers with output timestamp completely, or

              b) support it correctly

            What is the actual reason not to support output
            timestamps, that are
            ahead of firing timestamp? From my understanding, that
            should not be an
            issue, because there is TimestampCombiner.EARLIEST on the
            watermarkholdstate that corresponds to the output
            timestamp. If that is
            correct can we simply remove the check?

              Jan

Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

Reply via email to