Re: aggregating over triggered results

Robert Bradshaw Tue, 29 Oct 2019 13:40:49 -0700

No matter how the problem is structured, computing 30 day aggregations
for every 10 minute window requires storing at least 30day/10min =
~4000 sub-aggregations. In Beam, the elements themselves are not
stored in every window, only the intermediate aggregates.


I second Luke's suggestion to try it out and see if this is indeed a
prohibitive bottleneck.

On Tue, Oct 29, 2019 at 1:29 PM Luke Cwik <[email protected]> wrote:
>
> You should first try the obvious answer of using a sliding window of 30 days 
> every 10 minutes before you try the 60 days every 30 days.
> Beam has some optimizations which will assign a value to multiple windows and 
> only process that value once even if its in many windows. If that doesn't 
> perform well, then come back to dev@ and look to optimize.
>
> On Tue, Oct 29, 2019 at 1:22 PM Aaron Dixon <[email protected]> wrote:
>>
>> Hi I am new to Beam.
>>
>> I would like to accumulate data over 30 day period and perform a running 
>> aggregation over this data, say every 10 minutes.
>>
>> I could use a sliding window of 30 days every 10 minutes (triggering at end 
>> of window) but this seems grossly inefficient (both in terms of # of windows 
>> at play and # of events duplicated across these windows).
>>
>> A more efficient strategy seems to be to use a sliding window of 60 days 
>> every 30 days -- triggering every 10 minutes -- so that I'm guaranteed to 
>> have 30 days worth of data aggregated/combined in at least one of the 2 
>> at-play sliding windows.
>>
>> The last piece of this puzzle however would be to do a final global 
>> aggregation over only the keys from the latest trigger of the earlier 
>> sliding window.
>>
>> But Beam does not seem to offer a way to orchestrate this. Even though this 
>> seems like it would be a pretty common or fundamental ask.
>>
>> One thought I had was to re-window in a way that would isolate keys 
>> triggered at the same time, in the same window but I don't see any contracts 
>> from Beam that would allow an approach like that.
>>
>> What am I missing?
>>
>>

Re: aggregating over triggered results

Reply via email to