laraschmidt commented on pull request #15540:
URL: https://github.com/apache/beam/pull/15540#issuecomment-936865620
> > > Won't this allow for infinite skew since if have a timer at `X` and
skew of `-1` then the first time the timer is processed you can output at time
`X-1` and when it gets scheduled again you can now output at `X-2` since the
the new timers timestamp is `X-1`?
> >
> >
> > So my understanding of the reason for these checks is to stop people
from doing the wrong thing without realizing it. We don't even take any
different action based on this variable. It seems okay to apply this to each
specific output timestamp and let you skew more if you chain timers in this
fashion.
> > On a more practical note, there's reasons why you might want a timer to
output an earlier element if you've properly set up watermark holds. There's
currently no way to do that so we need some allowance. It would probably be
better if we could constrain skew from the first output timestamp but I don't
think that's available in the later timers, right?
> > If you disagree with the approach, I can bring this up on the email
thread for others to chime in in case they are not checking here.
>
> I think users will be surprised that their data will be dropped as late
once they pass the watermark skew bound if they output past it. The existing
logic had guards for this explicitly since it would be surprising for users so
I do believe it is important enough to discuss whether there is another
approach to solve this or we are ok with this happening.
We chatted a bit about this offline. There's actually no guarantee that the
watermark is held back when using DoFn#getAllowedTimestampSkew. The
allowedTimestampSkew just removes the check that we have to avoid accidentally
dropping late data. See the javadoc [1] and relevant reply from Jan [2].
[1]
https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/transforms/DoFn.html#getAllowedTimestampSkew--
[2]
https://lists.apache.org/thread.html/r34c70d8a5f213f7bd2f4557019e27b7f07f5120d0a8794512c88568c%40%3Cdev.beam.apache.org%3E
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]