You should first try the obvious answer of using a sliding window of 30 days every 10 minutes before you try the 60 days every 30 days. Beam has some optimizations which will assign a value to multiple windows and only process that value once even if its in many windows. If that doesn't perform well, then come back to dev@ and look to optimize.
On Tue, Oct 29, 2019 at 1:22 PM Aaron Dixon <[email protected]> wrote: > Hi I am new to Beam. > > I would like to accumulate data over 30 day period and perform a running > aggregation over this data, say every 10 minutes. > > I could use a sliding window of 30 days every 10 minutes (triggering at > end of window) but this seems grossly inefficient (both in terms of # of > windows at play and # of events duplicated across these windows). > > A more efficient strategy seems to be to use a sliding window of 60 days > every 30 days -- *triggering* every 10 minutes -- so that I'm guaranteed > to have 30 days worth of data aggregated/combined in at least one of the 2 > at-play sliding windows. > > The last piece of this puzzle however would be to do a final global > aggregation over *only the keys from the latest trigger of the earlier > sliding window*. > > But Beam does not seem to offer a way to orchestrate this. Even though > this seems like it would be a pretty common or fundamental ask. > > One thought I had was to re-window in a way that would isolate keys > triggered at the same time, in the same window but I don't see any > contracts from Beam that would allow an approach like that. > > What am I missing? > > >
