+1 Very nice proposal and the API already looks very good. I guess the only
thing people still like to discuss on this is naming of things. :-)

I just have one general remark about giving users access to state and
timers. The Beam model takes great care to mostly shield users from the
reality of out-of-order events. The windowing mostly deals with this
internally and the watermarks provide some level of completeness
guarantees. If users directly modify their state based on each arriving
element they might run into problems if they don't take into account that
elements can (will) arrive out-of-order. For example, let's say they have
three types of event: "start", "in-between", and "end". In the state
machine they probably assume that the "start" event will arrive first and
that the "end" event will arrive last. Due to slowdowns anywhere in the
system they might not arrive in that order, however, and the state machine
will trip up. This is an artificial example but I imagine there could be
real-world cases where this plays a role.

Do we have any ideas on mitigating those kinds of problems or will we rely
on users properly understanding that this could happen in their pipeline?

Cheers,
Aljoscha

On Wed, 27 Jul 2016 at 05:20 Kenneth Knowles <k...@google.com.invalid> wrote:

> Hi everyone,
>
>
> I would like to offer a proposal for a much-requested feature in Beam:
> Stateful processing in a DoFn. Please check out and comment on the proposal
> at this URL:
>
>
>   https://s.apache.org/beam-state
>
>
> This proposal includes user-facing APIs for persistent state and timers.
> Together, these provide rich capabilities that have been called "per-key
> workflows", the subject of [BEAM-23].
>
>
> Note that this proposal has an important prerequisite: a new design for
> DoFn. The new DoFn is strongly motivated by this design for state and
> timers, but we should discuss it separately. I will start a separate thread
> for that.
>
>
> On this email thread, I'd like to try to focus the discussion on state &
> timers. And of course, please do comment on the particulars in the
> document.
>
>
> Kenn
>
>
> [BEAM-23] https://issues.apache.org/jira/browse/BEAM-23
>

Reply via email to