@Reuven: thanks for letting me know. I thought that's expected. We ran into
this issue when we try to use the Stateful ParDo to process events from
session-windowed inputs. As a walk-around, we ended up reassigning global
window to these events and use our backend RocksDb state TTL to retire old
data.

Thanks,
Xinyu

On Tue, Oct 9, 2018 at 11:54 AM Reuven Lax <[email protected]> wrote:

> 2) is simply a bug that nobody has ever gotten around to fixing. Stateful
> ParDo should support merging windows such as sessions.
>
> On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu <[email protected]> wrote:
>
>> We do use stateful ParDo in the same job for a different use case (and we
>> did read through Kenn's blogs :) ). Here are the reasons why we prefer
>> using aggregation:
>>
>> 1) It's much convenient for the user to define the window and trigger and
>> have the Combine on top of it. It's not very clear how early firing works
>> in Stateful Pardo, and it does seem to require more user effort to set up
>> the states/timers.
>>
>> 2) It seems Stateful ParDo doesn't support non-emergent windows, e.g.
>> session window. This is actually one of our use case.
>>
>> 3) It seems quite general and more flexible to our users to allow
>> updating state after firing. I don't want to tell our further users to stay
>> with from Combine for this and they have to handle the state explicitly.
>>
>> Thanks,
>> Xinyu
>>
>>
>>
>> On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <[email protected]> wrote:
>>
>>> Hi Xinyu,
>>>
>>> There are two nice articles on Beam website about stateful processing
>>> that you may want to check out:
>>>
>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>>
>>> -Rui
>>>
>>> On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote:
>>>
>>>> Have you considered using Beam's state API for this?
>>>>
>>>> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi, guys,
>>>>>
>>>>> Current triggering allows us to either discard the state or accumulate
>>>>> the state after a window pane is fired. We use the extractOutput() in
>>>>> CombinFn to return the output value after the firing. All these have been
>>>>> working well for us. We do have a use case which seems not handled here: 
>>>>> we
>>>>> would like to update the state after the firing. Let me illustrate this 
>>>>> use
>>>>> case by an example: we have a 10-min fixed window with repeatedly early
>>>>> trigger of 1 min over an input stream which contains events of user id and
>>>>> page id. The accumulator for the window has two parts: 1) set of page ids
>>>>> already seen; 2) set of user ids who first views a page in this window
>>>>> (this is done by looking up #1). For each early firing, we want to output
>>>>> #2, and clear the second part of the state. But we would like to keep the
>>>>> #1 around for later calculations in this window. This example might be too
>>>>> simple to make sense, but it comes from one of our real use cases which is
>>>>> needed for some anti-abuse scenarios.
>>>>>
>>>>> To address this use case, is it OK to add a AccumT 
>>>>> updateAfterFiring(AccumT
>>>>> accumulator) in current CombinFn? That way the user can choose to
>>>>> update the state partially if needed, e.g. for our use case. Any feedback
>>>>> is very welcome.
>>>>>
>>>>> Thanks,
>>>>> Xinyu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Reply via email to