laraschmidt commented on pull request #15540:
URL: https://github.com/apache/beam/pull/15540#issuecomment-936865620


   > > > Won't this allow for infinite skew since if have a timer at `X` and 
skew of `-1` then the first time the timer is processed you can output at time 
`X-1` and when it gets scheduled again you can now output at `X-2` since the 
the new timers timestamp is `X-1`?
   > > 
   > > 
   > > So my understanding of the reason for these checks is to stop people 
from doing the wrong thing without realizing it. We don't even take any 
different action based on this variable. It seems okay to apply this to each 
specific output timestamp and let you skew more if you chain timers in this 
fashion.
   > > On a more practical note, there's reasons why you might want a timer to 
output an earlier element if you've properly set up watermark holds. There's 
currently no way to do that so we need some allowance. It would probably be 
better if we could constrain skew from the first output timestamp but I don't 
think that's available in the later timers, right?
   > > If you disagree with the approach, I can bring this up on the email 
thread for others to chime in in case they are not checking here.
   > 
   > I think users will be surprised that their data will be dropped as late 
once they pass the watermark skew bound if they output past it. The existing 
logic had guards for this explicitly since it would be surprising for users so 
I do believe it is important enough to discuss whether there is another 
approach to solve this or we are ok with this happening.
   
   We chatted a bit about this offline. There's actually no guarantee that the 
watermark is held back when using DoFn#getAllowedTimestampSkew. The 
allowedTimestampSkew just removes the check that we have to avoid accidentally 
dropping late data. See the javadoc [1] and relevant reply from Jan [2].
   
    [1] 
https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/transforms/DoFn.html#getAllowedTimestampSkew--
    [2] 
https://lists.apache.org/thread.html/r34c70d8a5f213f7bd2f4557019e27b7f07f5120d0a8794512c88568c%40%3Cdev.beam.apache.org%3E
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to