Hi Xinyu,

There are two nice articles on Beam website about stateful processing that
you may want to check out:

https://beam.apache.org/blog/2017/02/13/stateful-processing.html
https://beam.apache.org/blog/2017/08/28/timely-processing.html

-Rui

On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote:

> Have you considered using Beam's state API for this?
>
> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]> wrote:
>
>> Hi, guys,
>>
>> Current triggering allows us to either discard the state or accumulate
>> the state after a window pane is fired. We use the extractOutput() in
>> CombinFn to return the output value after the firing. All these have been
>> working well for us. We do have a use case which seems not handled here: we
>> would like to update the state after the firing. Let me illustrate this use
>> case by an example: we have a 10-min fixed window with repeatedly early
>> trigger of 1 min over an input stream which contains events of user id and
>> page id. The accumulator for the window has two parts: 1) set of page ids
>> already seen; 2) set of user ids who first views a page in this window
>> (this is done by looking up #1). For each early firing, we want to output
>> #2, and clear the second part of the state. But we would like to keep the
>> #1 around for later calculations in this window. This example might be too
>> simple to make sense, but it comes from one of our real use cases which is
>> needed for some anti-abuse scenarios.
>>
>> To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT
>> accumulator) in current CombinFn? That way the user can choose to update
>> the state partially if needed, e.g. for our use case. Any feedback is very
>> welcome.
>>
>> Thanks,
>> Xinyu
>>
>>
>>
>>
>>

Reply via email to