Hi Xinyu, There are two nice articles on Beam website about stateful processing that you may want to check out:
https://beam.apache.org/blog/2017/02/13/stateful-processing.html https://beam.apache.org/blog/2017/08/28/timely-processing.html -Rui On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote: > Have you considered using Beam's state API for this? > > On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]> wrote: > >> Hi, guys, >> >> Current triggering allows us to either discard the state or accumulate >> the state after a window pane is fired. We use the extractOutput() in >> CombinFn to return the output value after the firing. All these have been >> working well for us. We do have a use case which seems not handled here: we >> would like to update the state after the firing. Let me illustrate this use >> case by an example: we have a 10-min fixed window with repeatedly early >> trigger of 1 min over an input stream which contains events of user id and >> page id. The accumulator for the window has two parts: 1) set of page ids >> already seen; 2) set of user ids who first views a page in this window >> (this is done by looking up #1). For each early firing, we want to output >> #2, and clear the second part of the state. But we would like to keep the >> #1 around for later calculations in this window. This example might be too >> simple to make sense, but it comes from one of our real use cases which is >> needed for some anti-abuse scenarios. >> >> To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT >> accumulator) in current CombinFn? That way the user can choose to update >> the state partially if needed, e.g. for our use case. Any feedback is very >> welcome. >> >> Thanks, >> Xinyu >> >> >> >> >>
