[ 
https://issues.apache.org/jira/browse/BEAM-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530105#comment-17530105
 ] 

Reuven Lax commented on BEAM-14239:
-----------------------------------

strawperson proposal 1: we stop adding output timestamp to the tag, and every 
time a timer fires we read the watermark hold state to see what the output 
timestamp is.

main problem with this is that the extra read is expensive, and this strategy 
will cause us to do it for _every_ timer, even the ones that never set an 
output timestamp. so that leads me to

strawperson proposal 2: use the extra part of the timer tag to indicate that a 
watermark hold is needed, storing a constant value there. We fetch the 
watermark hold only if this bit is present in the tag.

Two problems with this proposal:

  1. I'm not sure how to issue state fetches from inside 
WindmillTimerInternals. Presumably it's possible to plumb something down.

 2. What if user code resets a timer that was set with an output timestamp and 
doesn't set an output timestamp? This will cause us to stop setting the extra 
bit in the tag, and again both timers will fire.

We could also change the API so that the user needs to statically declare 
whether given timers will have holds or not. However this is a fairly 
backwards-incompatible change to the timer API. 

> Changing the output timestamp of a timer does not clear the previously set 
> timer
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-14239
>                 URL: https://issues.apache.org/jira/browse/BEAM-14239
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.37.0
>            Reporter: Steve Niemitz
>            Priority: P1
>         Attachments: image-2022-04-04-09-57-29-583.png
>
>
> While looking into an unrelated bug with GroupIntoBatches, I noticed that it 
> seems like changing the output timestamp of a timer does not clear the 
> existing timer, and instead creates a new one.  
> This kind of makes sense looking at the implementation of timers in Dataflow, 
> the output timestamp is encoded into the timer ID, but this is not reflected 
> in the timerStillPresent map in WindmillTimerInternals.  It seems like it 
> should be, and the previous timer should be deleted.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to