There are also design decisions to be made around merging of timers. Also the potential to rethink automatically merging state vs letting the user have a callback - it might be a more natural fit with the low-level style and needs of stateful processing.
Kenn On Tue, Oct 9, 2018 at 12:33 PM Xinyu Liu <[email protected]> wrote: > @Reuven: thanks for letting me know. I thought that's expected. We ran > into this issue when we try to use the Stateful ParDo to process events > from session-windowed inputs. As a walk-around, we ended up reassigning > global window to these events and use our backend RocksDb state TTL to > retire old data. > > Thanks, > Xinyu > > On Tue, Oct 9, 2018 at 11:54 AM Reuven Lax <[email protected]> wrote: > >> 2) is simply a bug that nobody has ever gotten around to fixing. Stateful >> ParDo should support merging windows such as sessions. >> >> On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu <[email protected]> wrote: >> >>> We do use stateful ParDo in the same job for a different use case (and >>> we did read through Kenn's blogs :) ). Here are the reasons why we prefer >>> using aggregation: >>> >>> 1) It's much convenient for the user to define the window and trigger >>> and have the Combine on top of it. It's not very clear how early firing >>> works in Stateful Pardo, and it does seem to require more user effort to >>> set up the states/timers. >>> >>> 2) It seems Stateful ParDo doesn't support non-emergent windows, e.g. >>> session window. This is actually one of our use case. >>> >>> 3) It seems quite general and more flexible to our users to allow >>> updating state after firing. I don't want to tell our further users to stay >>> with from Combine for this and they have to handle the state explicitly. >>> >>> Thanks, >>> Xinyu >>> >>> >>> >>> On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <[email protected]> wrote: >>> >>>> Hi Xinyu, >>>> >>>> There are two nice articles on Beam website about stateful processing >>>> that you may want to check out: >>>> >>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html >>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html >>>> >>>> -Rui >>>> >>>> On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote: >>>> >>>>> Have you considered using Beam's state API for this? >>>>> >>>>> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, guys, >>>>>> >>>>>> Current triggering allows us to either discard the state or >>>>>> accumulate the state after a window pane is fired. We use the >>>>>> extractOutput() in CombinFn to return the output value after the firing. >>>>>> All these have been working well for us. We do have a use case which >>>>>> seems >>>>>> not handled here: we would like to update the state after the firing. Let >>>>>> me illustrate this use case by an example: we have a 10-min fixed window >>>>>> with repeatedly early trigger of 1 min over an input stream which >>>>>> contains >>>>>> events of user id and page id. The accumulator for the window has two >>>>>> parts: 1) set of page ids already seen; 2) set of user ids who first >>>>>> views >>>>>> a page in this window (this is done by looking up #1). For each early >>>>>> firing, we want to output #2, and clear the second part of the state. But >>>>>> we would like to keep the #1 around for later calculations in this >>>>>> window. >>>>>> This example might be too simple to make sense, but it comes from one of >>>>>> our real use cases which is needed for some anti-abuse scenarios. >>>>>> >>>>>> To address this use case, is it OK to add a AccumT >>>>>> updateAfterFiring(AccumT >>>>>> accumulator) in current CombinFn? That way the user can choose to >>>>>> update the state partially if needed, e.g. for our use case. Any feedback >>>>>> is very welcome. >>>>>> >>>>>> Thanks, >>>>>> Xinyu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>
