[
https://issues.apache.org/jira/browse/BEAM-9308?focusedWorklogId=391029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391029
]
ASF GitHub Bot logged work on BEAM-9308:
----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Feb/20 02:24
Start Date: 22/Feb/20 02:24
Worklog Time Spent: 10m
Work Description: steveniemitz commented on issue #10852: [BEAM-9308]
Decorrelate state cleanup timers
URL: https://github.com/apache/beam/pull/10852#issuecomment-589907688
Yay thanks for looking at this. I'll address your points in reverse order :P
> Maybe we need a better prioritization strategy so that large #s of timers
don't starve out elements?
I think that'd be the best overall option, but ideally we'd have variable
priority. ie, state cleanup timers should be low priority, while user timers
should be the same priority as "normal" elements. In the end though, if we end
up with state cleanup timers delayed by N minutes because they are
deprioritized, that seems like we'd be in the same spot as explicitly
decorrelating them here.
> Delaying the timer will also prevent downstream aggregations from firing.
3 minutes could cause issues if the window itself is much smaller.
Agreed, I sort of touched on this on my comment about letting the duration
be configurable. Ideally it'd be some fraction of the window duration itself.
I'm not sure it actually will delay the downstream aggregations from firing
however, since the firing time it set to after the window closes (maxTimestamp
+ allowedLateness + 1ms), so once these begin firing, the watermark has already
passed the end of the window. Or am I misunderstanding something here?
> We want to reuse this timer for OnWindowExpiration, and this will delay
all those callbacks as well.
I'd actually argue that's preferable, since you'd have the same problem
there was well (potentially millions of timers firing at the same time).
> We currently rely on the state cleanup timer for watermark holds.
Is this true? The state cleanup timer is already set past the end of the
window, so by the time the timer fires the window has already closed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 391029)
Time Spent: 50m (was: 40m)
> Optimize state cleanup at end-of-window
> ---------------------------------------
>
> Key: BEAM-9308
> URL: https://issues.apache.org/jira/browse/BEAM-9308
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Steve Niemitz
> Assignee: Steve Niemitz
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When using state with a large keyspace, you can end up with a large amount of
> state cleanup timers set to fire all 1ms after the end of a window. This can
> cause a momentary (I've observed 1-3 minute) lag in processing while windmill
> and the java harness fire and process these cleanup timers.
> By spreading the firing over a short period after the end of the window, we
> can decorrelate the firing of the timers and smooth the load out, resulting
> in much less impact from state cleanup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)