+1 Very nice proposal and the API already looks very good. I guess the only thing people still like to discuss on this is naming of things. :-)
I just have one general remark about giving users access to state and timers. The Beam model takes great care to mostly shield users from the reality of out-of-order events. The windowing mostly deals with this internally and the watermarks provide some level of completeness guarantees. If users directly modify their state based on each arriving element they might run into problems if they don't take into account that elements can (will) arrive out-of-order. For example, let's say they have three types of event: "start", "in-between", and "end". In the state machine they probably assume that the "start" event will arrive first and that the "end" event will arrive last. Due to slowdowns anywhere in the system they might not arrive in that order, however, and the state machine will trip up. This is an artificial example but I imagine there could be real-world cases where this plays a role. Do we have any ideas on mitigating those kinds of problems or will we rely on users properly understanding that this could happen in their pipeline? Cheers, Aljoscha On Wed, 27 Jul 2016 at 05:20 Kenneth Knowles <k...@google.com.invalid> wrote: > Hi everyone, > > > I would like to offer a proposal for a much-requested feature in Beam: > Stateful processing in a DoFn. Please check out and comment on the proposal > at this URL: > > > https://s.apache.org/beam-state > > > This proposal includes user-facing APIs for persistent state and timers. > Together, these provide rich capabilities that have been called "per-key > workflows", the subject of [BEAM-23]. > > > Note that this proposal has an important prerequisite: a new design for > DoFn. The new DoFn is strongly motivated by this design for state and > timers, but we should discuss it separately. I will start a separate thread > for that. > > > On this email thread, I'd like to try to focus the discussion on state & > timers. And of course, please do comment on the particulars in the > document. > > > Kenn > > > [BEAM-23] https://issues.apache.org/jira/browse/BEAM-23 >