We do use stateful ParDo in the same job for a different use case (and we
did read through Kenn's blogs :) ). Here are the reasons why we prefer
using aggregation:

1) It's much convenient for the user to define the window and trigger and
have the Combine on top of it. It's not very clear how early firing works
in Stateful Pardo, and it does seem to require more user effort to set up
the states/timers.

2) It seems Stateful ParDo doesn't support non-emergent windows, e.g.
session window. This is actually one of our use case.

3) It seems quite general and more flexible to our users to allow updating
state after firing. I don't want to tell our further users to stay with
from Combine for this and they have to handle the state explicitly.

Thanks,
Xinyu



On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <[email protected]> wrote:

> Hi Xinyu,
>
> There are two nice articles on Beam website about stateful processing that
> you may want to check out:
>
> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>
> -Rui
>
> On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote:
>
>> Have you considered using Beam's state API for this?
>>
>> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]> wrote:
>>
>>> Hi, guys,
>>>
>>> Current triggering allows us to either discard the state or accumulate
>>> the state after a window pane is fired. We use the extractOutput() in
>>> CombinFn to return the output value after the firing. All these have been
>>> working well for us. We do have a use case which seems not handled here: we
>>> would like to update the state after the firing. Let me illustrate this use
>>> case by an example: we have a 10-min fixed window with repeatedly early
>>> trigger of 1 min over an input stream which contains events of user id and
>>> page id. The accumulator for the window has two parts: 1) set of page ids
>>> already seen; 2) set of user ids who first views a page in this window
>>> (this is done by looking up #1). For each early firing, we want to output
>>> #2, and clear the second part of the state. But we would like to keep the
>>> #1 around for later calculations in this window. This example might be too
>>> simple to make sense, but it comes from one of our real use cases which is
>>> needed for some anti-abuse scenarios.
>>>
>>> To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT
>>> accumulator) in current CombinFn? That way the user can choose to
>>> update the state partially if needed, e.g. for our use case. Any feedback
>>> is very welcome.
>>>
>>> Thanks,
>>> Xinyu
>>>
>>>
>>>
>>>
>>>

Reply via email to