Currently the Python SDK does not make ValueState available to users. My initial inclination was to go ahead and implement it there to be consistent with Java, but Robert brings up a great point here that ValueState has an inherent race condition for out of order data, and a lot of it's use cases can actually be implemented with a CombiningState instead.
It seems to me that at the very least we should discourage the use of ValueState by noting the danger in the documentation and preferring CombiningState in examples, and perhaps we should go further and deprecate it in Java and not implement it in python. Either way I think we should be consistent between Java and Python. I'm curious what people think about this, are there use cases that we really need to keep ValueState around for? Brian ---------- Forwarded message --------- From: Robert Bradshaw <rober...@google.com> Date: Thu, Apr 25, 2019, 08:31 Subject: Re: [docs] Python State & Timers To: dev <dev@beam.apache.org> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels <m...@apache.org> wrote: > Completely agree that CombiningState is nicer in this example. Users may > still want to use ValueState when there is nothing to combine. I've always had trouble coming up with any good examples of this. Also, > users already know ValueState from the Java SDK. > Maybe we should deprecate that :) On 25.04.19 17:12, Robert Bradshaw wrote: > > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <m...@apache.org> > wrote: > >> > >> I forgot to give an example, just to clarify for others: > >> > >>> What was the specific example that was less natural? > >> > >> Basically every time we use ListState to express ValueState, e.g. > >> > >> next_index, = list(state.read()) or [0] > >> > >> Taken from: > >> > https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301 > > > > Yes, ListState is much less natural here. I think generally > > CombiningValue is often a better replacement. E.g. the Java example > > reads > > > > > > public void processElement( > > ProcessContext context, @StateId("index") ValueState<Integer> > index) { > > int current = firstNonNull(index.read(), 0); > > context.output(KV.of(current, context.element())); > > index.write(current+1); > > } > > > > > > which is replaced with bag state > > > > > > def process(self, element, state=DoFn.StateParam(INDEX_STATE)): > > next_index, = list(state.read()) or [0] > > yield (element, next_index) > > state.clear() > > state.add(next_index + 1) > > > > > > whereas CombiningState would be more natural (than ListState, and > > arguably than even ValueState), giving > > > > > > def process(self, element, index=DoFn.StateParam(INDEX_STATE)): > > yield element, index.read() > > index.add(1) > > > > > > > > > >> > >> -Max > >> > >> On 25.04.19 16:40, Robert Bradshaw wrote: > >>> https://github.com/apache/beam/pull/8402 > >>> > >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <rober...@google.com> > wrote: > >>>> > >>>> Oh, this is for the indexing example. > >>>> > >>>> I actually think using CombiningState is more cleaner than ValueState. > >>>> > >>>> > https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262 > >>>> > >>>> (The fact that one must specify the accumulator coder is, however, > >>>> unfortunate. We should probably infer that if we can.) > >>>> > >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <rober...@google.com> > wrote: > >>>>> > >>>>> The desire was to avoid the implicit disallowed combination wart in > >>>>> Python (until we could make sense of it), and also ValueState could > be > >>>>> surprising with respect to older values overwriting newer ones. What > >>>>> was the specific example that was less natural? > >>>>> > >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian Michels <m...@apache.org> > wrote: > >>>>>> > >>>>>> @Pablo: Thanks for following up with the PR! :) > >>>>>> > >>>>>> @Brian: I was wondering about this as well. It makes the Python > state > >>>>>> code a bit unnatural. I'd suggest to add a ValueState wrapper around > >>>>>> ListState/CombiningState. > >>>>>> > >>>>>> @Robert: Like Reuven pointed out, we can disallow ValueState for > merging > >>>>>> windows with state. > >>>>>> > >>>>>> @Reza: Great. Let's make sure it has Python examples out of the box. > >>>>>> Either Pablo or me could help there. > >>>>>> > >>>>>> Thanks, > >>>>>> Max > >>>>>> > >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote: > >>>>>>> Pablo, Kenneth and I have a new blog ready for publication which > covers > >>>>>>> how to create a "looping timer" it allows for default values to be > >>>>>>> created in a window when no incoming elements exists. We just need > to > >>>>>>> clear a few bits before publication, but would be great to have > that > >>>>>>> also include a python example, I wrote it in java... > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>> Reza > >>>>>>> > >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax <re...@google.com > >>>>>>> <mailto:re...@google.com>> wrote: > >>>>>>> > >>>>>>> Well state is still not implemented for merging windows even > for > >>>>>>> Java (though I believe the idea was to disallow ValueState > there). > >>>>>>> > >>>>>>> On Wed, Apr 24, 2019 at 1:11 PM Robert Bradshaw < > rober...@google.com > >>>>>>> <mailto:rober...@google.com>> wrote: > >>>>>>> > >>>>>>> It was unclear what the semantics were for ValueState > for merging > >>>>>>> windows. (It's also a bit weird as it's inherently a > race condition > >>>>>>> wrt element ordering, unlike Bag and CombineState, > though you can > >>>>>>> always implement it as a CombineState that always > returns the latest > >>>>>>> value which is a bit more explicit about the dangers > here.) > >>>>>>> > >>>>>>> On Wed, Apr 24, 2019 at 10:08 PM Brian Hulette > >>>>>>> <bhule...@google.com <mailto:bhule...@google.com>> > wrote: > >>>>>>> > > >>>>>>> > That's a great idea! I thought about this too after > those > >>>>>>> posts came up on the list recently. I started to look > into it, > >>>>>>> but I noticed that there's actually no implementation of > >>>>>>> ValueState in userstate. Is there a reason for that? I > started > >>>>>>> to work on a patch to add it but I was just curious if > there was > >>>>>>> some reason it was omitted that I should be aware of. > >>>>>>> > > >>>>>>> > We could certainly replicate the example without > ValueState > >>>>>>> by using BagState and clearing it before each write, but > it > >>>>>>> would be nice if we could draw a direct parallel. > >>>>>>> > > >>>>>>> > Brian > >>>>>>> > > >>>>>>> > On Fri, Apr 12, 2019 at 7:05 AM Maximilian Michels > >>>>>>> <m...@apache.org <mailto:m...@apache.org>> wrote: > >>>>>>> >> > >>>>>>> >> > It would probably be pretty easy to add the > corresponding > >>>>>>> code snippets to the docs as well. > >>>>>>> >> > >>>>>>> >> It's probably a bit more work because there is no > section > >>>>>>> dedicated to > >>>>>>> >> state/timer yet in the documentation. Tracked here: > >>>>>>> >> https://jira.apache.org/jira/browse/BEAM-2472 > >>>>>>> >> > >>>>>>> >> > I've been going over this topic a bit. I'll add the > >>>>>>> snippets next week, if that's fine by y'all. > >>>>>>> >> > >>>>>>> >> That would be great. The blog posts are a great way > to get > >>>>>>> started with > >>>>>>> >> state/timers. > >>>>>>> >> > >>>>>>> >> Thanks, > >>>>>>> >> Max > >>>>>>> >> > >>>>>>> >> On 11.04.19 20:21, Pablo Estrada wrote: > >>>>>>> >> > I've been going over this topic a bit. I'll add the > >>>>>>> snippets next week, > >>>>>>> >> > if that's fine by y'all. > >>>>>>> >> > Best > >>>>>>> >> > -P. > >>>>>>> >> > > >>>>>>> >> > On Thu, Apr 11, 2019 at 5:27 AM Robert Bradshaw > >>>>>>> <rober...@google.com <mailto:rober...@google.com> > >>>>>>> >> > <mailto:rober...@google.com <mailto: > rober...@google.com>>> > >>>>>>> wrote: > >>>>>>> >> > > >>>>>>> >> > That's a great idea! It would probably be > pretty easy > >>>>>>> to add the > >>>>>>> >> > corresponding code snippets to the docs as > well. > >>>>>>> >> > > >>>>>>> >> > On Thu, Apr 11, 2019 at 2:00 PM Maximilian > Michels > >>>>>>> <m...@apache.org <mailto:m...@apache.org> > >>>>>>> >> > <mailto:m...@apache.org <mailto:m...@apache.org>>> > wrote: > >>>>>>> >> > > > >>>>>>> >> > > Hi everyone, > >>>>>>> >> > > > >>>>>>> >> > > The Python SDK still lacks documentation on > state > >>>>>>> and timers. > >>>>>>> >> > > > >>>>>>> >> > > As a first step, what do you think about > updating > >>>>>>> these two blog > >>>>>>> >> > posts > >>>>>>> >> > > with the corresponding Python code? > >>>>>>> >> > > > >>>>>>> >> > > > >>>>>>> > https://beam.apache.org/blog/2017/02/13/stateful-processing.html > >>>>>>> >> > > > >>>>>>> > https://beam.apache.org/blog/2017/08/28/timely-processing.html > >>>>>>> >> > > > >>>>>>> >> > > Thanks, > >>>>>>> >> > > Max > >>>>>>> >> > > >>>>>>> >