You should first try the obvious answer of using a sliding window of 30
days every 10 minutes before you try the 60 days every 30 days.
Beam has some optimizations which will assign a value to multiple windows
and only process that value once even if its in many windows. If that
doesn't perform well, then come back to dev@ and look to optimize.

On Tue, Oct 29, 2019 at 1:22 PM Aaron Dixon <[email protected]> wrote:

> Hi I am new to Beam.
>
> I would like to accumulate data over 30 day period and perform a running
> aggregation over this data, say every 10 minutes.
>
> I could use a sliding window of 30 days every 10 minutes (triggering at
> end of window) but this seems grossly inefficient (both in terms of # of
> windows at play and # of events duplicated across these windows).
>
> A more efficient strategy seems to be to use a sliding window of 60 days
> every 30 days -- *triggering* every 10 minutes -- so that I'm guaranteed
> to have 30 days worth of data aggregated/combined in at least one of the 2
> at-play sliding windows.
>
> The last piece of this puzzle however would be to do a final global
> aggregation over *only the keys from the latest trigger of the earlier
> sliding window*.
>
> But Beam does not seem to offer a way to orchestrate this. Even though
> this seems like it would be a pretty common or fundamental ask.
>
> One thought I had was to re-window in a way that would isolate keys
> triggered at the same time, in the same window but I don't see any
> contracts from Beam that would allow an approach like that.
>
> What am I missing?
>
>
>

Reply via email to