Hm, one thing in the last paragraph seems there might be some logical gap.
> For a runner author to implement a "cleanup timer" requires a
different mechanism. A window expires when *both* the input element
watermark *and* the timer watermark are past the expiry time. In other
words, the cleanup timer fires according to the minimum of these
watermarks, combined. It *cannot* fire according to the input element
watermark. If you naively try to implement it as a user timer, it will
be incorrect. Incidentally this is why @OnWindowExpiration is a
meaningful feature.
The description describes a timer that fires not according to input
watermark, but according to the output watermark (once the output
watermark reaches certain point in time). That logically implies, that
such a timer cannot have non-droppable output (at least if its output
timestamp belongs to the respective window) and cannot create a
watermark hold (because that would block the progress of the output
watermark and might cause the timer to not fire ever). This maybe might
be a useful user-feature as well, probably again mostly related to how
user-code might want to deal with droppable data.
Jan
On 5/4/21 6:41 PM, Kenneth Knowles wrote:
Mean to also add +Reuven Lax <mailto:re...@google.com>
On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>> wrote:
Explicitly pinging a couple folks who were involved in the
original change which yours essentially reverts. There's a model
question here that I want to clarify on-list:
When you have a ParDo setting timers, you have an additional
watermark that must be considered:
- input element watermark
- output watermark
- *(user) timer watermark*
The timer watermark is an input to the ParDo. Sometimes you might
think of the "timer channel" as a self loop, where each timer is
an element. Each timer has a timestamp (the output timestamp) and
separately some instructions on when to deliver that timer. This
is the same as the usual difference between event time and
processing time.
The instruction on when to deliver a timer can have two forms:
- wait a certain amount of processing time
- deliver the timer when the *input element watermark* reaches a
time X
Here is an important point: "cleanup timers" are *not* user
timers. They are an implementation detail. They are not part of
the model. The runner's job is to reclaim resources as windows
expire. A user should never be reasoning about how their timers
relate to cleanup timers (except for resource consumption).
Because there is no relationship except that the cleanup should
happen "eventually" and invisibly.
For a runner author to implement a "cleanup timer" requires a
different mechanism. A window expires when *both* the input
element watermark *and* the timer watermark are past the expiry
time. In other words, the cleanup timer fires according to the
minimum of these watermarks, combined. It *cannot* fire according
to the input element watermark. If you naively try to implement it
as a user timer, it will be incorrect. Incidentally this is
why @OnWindowExpiration is a meaningful feature.
Kenn
On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi Kenn,
I created BEAM-12276 [1] with PR [2].
Jan
[1] https://issues.apache.org/jira/browse/BEAM-12276
<https://issues.apache.org/jira/browse/BEAM-12276>
[2] https://github.com/apache/beam/pull/14718
<https://github.com/apache/beam/pull/14718>
On 5/3/21 7:46 PM, Kenneth Knowles wrote:
This seems like just a bug. If you set a timer for X and have
output timestamp Y where X < Y this should be fine. Is the
problem the current input watermark? Are you trying to set a
timer with output timestamp that is already past? I think
that should be allowed, too, as long as the window is not
expired, but I may be missing something.
Some greater detail would be useful - maybe the full stack
trace and/or a failing unit test in a PR?
Kenn
On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský
<je...@seznam.cz <mailto:je...@seznam.cz>> wrote:
Hi,
I have come across a bug with timer output timestamp -
when using event
time and relative timers, setting the timer can
arbitrarily throw
IllegalArgumentException if the firing timestamp (input
watermark) is
ahead of the output timestamp (like
.java.lang.IllegalArgumentException:
Attempted to set an event-time timer with an output
timestamp of
2021-04-29T07:16:19.369Z that is after the timer firing
timestamp
-290308-12-21T19:59:05.225Z). But there is no way to
access the firing
timestamp from user code. This means that the use has to
either catch
the IllegalArgumentException, or not use this construct
at all.
Catching the exception should probably not be part of a
contract, so we
should do one of the following:
a) either throw the exception right away and disable
using relative
timers with output timestamp completely, or
b) support it correctly
What is the actual reason not to support output
timestamps, that are
ahead of firing timestamp? From my understanding, that
should not be an
issue, because there is TimestampCombiner.EARLIEST on the
watermarkholdstate that corresponds to the output
timestamp. If that is
correct can we simply remove the check?
Jan