[ 
https://issues.apache.org/jira/browse/BEAM-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869637#comment-16869637
 ] 

Steve Niemitz commented on BEAM-7614:
-------------------------------------

oh, another odd thing I should mention.  I originally found this because a 
fused step to the timer firing was throwing an exception on the null key.  I 
would have expected this to then get "stuck" retrying the timer forever, but 
instead it just threw an exception once and then likely never tried again.  
Probably because the timer had actually already succeeded the first time?

> Event-time timers seem to sometimes fire multiple times on dataflow + 
> streaming engine
> --------------------------------------------------------------------------------------
>
>                 Key: BEAM-7614
>                 URL: https://issues.apache.org/jira/browse/BEAM-7614
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Steve Niemitz
>            Priority: Major
>
> This is kind of hard to reproduce, but I've seen it happen a few times in the 
> wild now.
> We have a DoFn that sets an event-time timer at window.maxTimestamp, the 
> timer callback does something like:
> {code:java}
> def onWindowClose(
>   @StateId(...) key: ValueState[K], 
>   @StateId(...) values: CombiningState[V],
>   out: OutputReceiver[O], 
>   ...
> ) {
>   
>   val k = key.read()
>   val values = values.read()
>   out.output(KV.of(k, values)
>   key.clear()
>   values.clear()  
> }{code}
> Essentially, keep track of the key, accumulate values seen in a window, and 
> emit them at the end of the window.  
> ProcessElement is pretty simple as well:
> {code:java}
> def processElement(
>   ctx: ProcessContext, 
>   @StateId(...) key: ValueState[K], 
>   @StateId(...) values: CombiningState[V],
>   ...
> ) {
>   key.write(ctx.element().getKey())
>   value.add(ctx.element().getValue())
>   timer.set(window.maxTimestamp())
> }{code}
> However, *ONLY* when running on streaming engine (this doesn't happen 
> otherwise), I'll see cases where the onWindowClose timer fires with a null 
> key, and empty values.
> This can only happen if the timer fired twice, since it wouldn't have been 
> set if no elements had arrived, and if late data had arrived, it would have 
> set the key (and added to the combining state).  Also, we never have late 
> date in our pipeline.
> An interesting other thing I noticed is that these "phantom firings" seem to 
> happen ~10-15 minutes _AFTER_ the window closes.
> Again, its pretty rate, we'll have millions of keys in a window, and I'll 
> only see the error happen every few hours (with hourly windows).
> Let me know if I can clarify anything else!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to