Completely agree that CombiningState is nicer in this example. Users may still want to use ValueState when there is nothing to combine. Also, users already know ValueState from the Java SDK.

On 25.04.19 17:12, Robert Bradshaw wrote:
On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <m...@apache.org> wrote:

I forgot to give an example, just to clarify for others:

What was the specific example that was less natural?

Basically every time we use ListState to express ValueState, e.g.

    next_index, = list(state.read()) or [0]

Taken from:
https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301

Yes, ListState is much less natural here. I think generally
CombiningValue is often a better replacement. E.g. the Java example
reads


public void processElement(
       ProcessContext context, @StateId("index") ValueState<Integer> index) {
     int current = firstNonNull(index.read(), 0);
     context.output(KV.of(current, context.element()));
     index.write(current+1);
}


which is replaced with bag state


def process(self, element, state=DoFn.StateParam(INDEX_STATE)):
     next_index, = list(state.read()) or [0]
     yield (element, next_index)
     state.clear()
     state.add(next_index + 1)


whereas CombiningState would be more natural (than ListState, and
arguably than even ValueState), giving


def process(self, element, index=DoFn.StateParam(INDEX_STATE)):
     yield element, index.read()
     index.add(1)





-Max

On 25.04.19 16:40, Robert Bradshaw wrote:
https://github.com/apache/beam/pull/8402

On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <rober...@google.com> wrote:

Oh, this is for the indexing example.

I actually think using CombiningState is more cleaner than ValueState.

https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262

(The fact that one must specify the accumulator coder is, however,
unfortunate. We should probably infer that if we can.)

On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <rober...@google.com> wrote:

The desire was to avoid the implicit disallowed combination wart in
Python (until we could make sense of it), and also ValueState could be
surprising with respect to older values overwriting newer ones. What
was the specific example that was less natural?

On Thu, Apr 25, 2019 at 3:01 PM Maximilian Michels <m...@apache.org> wrote:

@Pablo: Thanks for following up with the PR! :)

@Brian: I was wondering about this as well. It makes the Python state
code a bit unnatural. I'd suggest to add a ValueState wrapper around
ListState/CombiningState.

@Robert: Like Reuven pointed out, we can disallow ValueState for merging
windows with state.

@Reza: Great. Let's make sure it has Python examples out of the box.
Either Pablo or me could help there.

Thanks,
Max

On 25.04.19 04:14, Reza Ardeshir Rokni wrote:
Pablo, Kenneth and I have a new blog ready for publication which covers
how to create a "looping timer" it allows for default values to be
created in a window when no incoming elements exists. We just need to
clear a few bits before publication, but would be great to have that
also include a python example, I wrote it in java...

Cheers

Reza

On Thu, 25 Apr 2019 at 04:34, Reuven Lax <re...@google.com
<mailto:re...@google.com>> wrote:

      Well state is still not implemented for merging windows even for
      Java (though I believe the idea was to disallow ValueState there).

      On Wed, Apr 24, 2019 at 1:11 PM Robert Bradshaw <rober...@google.com
      <mailto:rober...@google.com>> wrote:

          It was unclear what the semantics were for ValueState for merging
          windows. (It's also a bit weird as it's inherently a race condition
          wrt element ordering, unlike Bag and CombineState, though you can
          always implement it as a CombineState that always returns the latest
          value which is a bit more explicit about the dangers here.)

          On Wed, Apr 24, 2019 at 10:08 PM Brian Hulette
          <bhule...@google.com <mailto:bhule...@google.com>> wrote:
           >
           > That's a great idea! I thought about this too after those
          posts came up on the list recently. I started to look into it,
          but I noticed that there's actually no implementation of
          ValueState in userstate. Is there a reason for that? I started
          to work on a patch to add it but I was just curious if there was
          some reason it was omitted that I should be aware of.
           >
           > We could certainly replicate the example without ValueState
          by using BagState and clearing it before each write, but it
          would be nice if we could draw a direct parallel.
           >
           > Brian
           >
           > On Fri, Apr 12, 2019 at 7:05 AM Maximilian Michels
          <m...@apache.org <mailto:m...@apache.org>> wrote:
           >>
           >> > It would probably be pretty easy to add the corresponding
          code snippets to the docs as well.
           >>
           >> It's probably a bit more work because there is no section
          dedicated to
           >> state/timer yet in the documentation. Tracked here:
           >> https://jira.apache.org/jira/browse/BEAM-2472
           >>
           >> > I've been going over this topic a bit. I'll add the
          snippets next week, if that's fine by y'all.
           >>
           >> That would be great. The blog posts are a great way to get
          started with
           >> state/timers.
           >>
           >> Thanks,
           >> Max
           >>
           >> On 11.04.19 20:21, Pablo Estrada wrote:
           >> > I've been going over this topic a bit. I'll add the
          snippets next week,
           >> > if that's fine by y'all.
           >> > Best
           >> > -P.
           >> >
           >> > On Thu, Apr 11, 2019 at 5:27 AM Robert Bradshaw
          <rober...@google.com <mailto:rober...@google.com>
           >> > <mailto:rober...@google.com <mailto:rober...@google.com>>>
          wrote:
           >> >
           >> >     That's a great idea! It would probably be pretty easy
          to add the
           >> >     corresponding code snippets to the docs as well.
           >> >
           >> >     On Thu, Apr 11, 2019 at 2:00 PM Maximilian Michels
          <m...@apache.org <mailto:m...@apache.org>
           >> >     <mailto:m...@apache.org <mailto:m...@apache.org>>> wrote:
           >> >      >
           >> >      > Hi everyone,
           >> >      >
           >> >      > The Python SDK still lacks documentation on state
          and timers.
           >> >      >
           >> >      > As a first step, what do you think about updating
          these two blog
           >> >     posts
           >> >      > with the corresponding Python code?
           >> >      >
           >> >      >
          https://beam.apache.org/blog/2017/02/13/stateful-processing.html
           >> >      >
          https://beam.apache.org/blog/2017/08/28/timely-processing.html
           >> >      >
           >> >      > Thanks,
           >> >      > Max
           >> >

Reply via email to