There are also design decisions to be made around merging of timers. Also
the potential to rethink automatically merging state vs letting the user
have a callback - it might be a more natural fit with the low-level style
and needs of stateful processing.

Kenn

On Tue, Oct 9, 2018 at 12:33 PM Xinyu Liu <[email protected]> wrote:

> @Reuven: thanks for letting me know. I thought that's expected. We ran
> into this issue when we try to use the Stateful ParDo to process events
> from session-windowed inputs. As a walk-around, we ended up reassigning
> global window to these events and use our backend RocksDb state TTL to
> retire old data.
>
> Thanks,
> Xinyu
>
> On Tue, Oct 9, 2018 at 11:54 AM Reuven Lax <[email protected]> wrote:
>
>> 2) is simply a bug that nobody has ever gotten around to fixing. Stateful
>> ParDo should support merging windows such as sessions.
>>
>> On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu <[email protected]> wrote:
>>
>>> We do use stateful ParDo in the same job for a different use case (and
>>> we did read through Kenn's blogs :) ). Here are the reasons why we prefer
>>> using aggregation:
>>>
>>> 1) It's much convenient for the user to define the window and trigger
>>> and have the Combine on top of it. It's not very clear how early firing
>>> works in Stateful Pardo, and it does seem to require more user effort to
>>> set up the states/timers.
>>>
>>> 2) It seems Stateful ParDo doesn't support non-emergent windows, e.g.
>>> session window. This is actually one of our use case.
>>>
>>> 3) It seems quite general and more flexible to our users to allow
>>> updating state after firing. I don't want to tell our further users to stay
>>> with from Combine for this and they have to handle the state explicitly.
>>>
>>> Thanks,
>>> Xinyu
>>>
>>>
>>>
>>> On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <[email protected]> wrote:
>>>
>>>> Hi Xinyu,
>>>>
>>>> There are two nice articles on Beam website about stateful processing
>>>> that you may want to check out:
>>>>
>>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>>>
>>>> -Rui
>>>>
>>>> On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote:
>>>>
>>>>> Have you considered using Beam's state API for this?
>>>>>
>>>>> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi, guys,
>>>>>>
>>>>>> Current triggering allows us to either discard the state or
>>>>>> accumulate the state after a window pane is fired. We use the
>>>>>> extractOutput() in CombinFn to return the output value after the firing.
>>>>>> All these have been working well for us. We do have a use case which 
>>>>>> seems
>>>>>> not handled here: we would like to update the state after the firing. Let
>>>>>> me illustrate this use case by an example: we have a 10-min fixed window
>>>>>> with repeatedly early trigger of 1 min over an input stream which 
>>>>>> contains
>>>>>> events of user id and page id. The accumulator for the window has two
>>>>>> parts: 1) set of page ids already seen; 2) set of user ids who first 
>>>>>> views
>>>>>> a page in this window (this is done by looking up #1). For each early
>>>>>> firing, we want to output #2, and clear the second part of the state. But
>>>>>> we would like to keep the #1 around for later calculations in this 
>>>>>> window.
>>>>>> This example might be too simple to make sense, but it comes from one of
>>>>>> our real use cases which is needed for some anti-abuse scenarios.
>>>>>>
>>>>>> To address this use case, is it OK to add a AccumT 
>>>>>> updateAfterFiring(AccumT
>>>>>> accumulator) in current CombinFn? That way the user can choose to
>>>>>> update the state partially if needed, e.g. for our use case. Any feedback
>>>>>> is very welcome.
>>>>>>
>>>>>> Thanks,
>>>>>> Xinyu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Reply via email to