Re: [DISCUSS] Adding Some Sort of SideInputRunner

Kenneth Knowles Tue, 03 May 2016 10:40:23 -0700

I think the answer to your questions might be StateNamespace.

The lowest level of state is always key-scoped, while the StateNamespace
indicates whether it is global to the key, further scoped to a particular
window, or even scoped to a particular trigger. When the DoFn needs a side
input, the key might actually be gone from the user's point of view. It is
up to the StepContext to provide an appropriately-scoped StateInternals,
usually by some consistent sharding key such as the key from the upstream
GBK.


I don't want to go too much into state accessed in the DoFn as I haven't
yet got a chance to prepare and publish the design doc for that, and I want
everyone to have access to it for any discussion.

Does this help?

On Tue, May 3, 2016 at 1:58 AM, Aljoscha Krettek <[email protected]>
wrote:

> I'm afraid I have yet another question. What's the interplay between the
> state that holds the buffered main-input elements and possible per-key
> state that might be used by the DoFn. I guess I'm not seeing all the parts
> but my problem is that one part (the buffering) requires a different type
> of state scope as the other part (key-scoped state access in the DoFn)
> while they both seem to be using the same StateInternals form the step
> context. How does that work?
>
> Cheers,
> Aljoscha
>
> On Thu, 28 Apr 2016 at 20:05 Kenneth Knowles <[email protected]>
> wrote:
>
> > On Thu, Apr 28, 2016 at 10:19 AM, Aljoscha Krettek <[email protected]>
> > wrote:
> >
> > > No worries :-) and thanks for the detailed answers!
> > >
> > > I still have one question, though: you wrote that "The side input is
> > > considered ready when there has been any data output/added to the
> > > PCollection that it is being read as a side input. So the upstream
> > trigger
> > > controls this." How does this work with side inputs that consist of
> > > multiple elements, i.e. ListPCollectionView and MapPCollectionView. For
> > > them, do we also consider the side input as ready once the first
> element
> > > arrives? That's why I was wondering about the triggers being
> responsible
> > > for deciding when a side input is ready.
> > >
> >
> > Yes, just as you describe. The side input window becomes ready once it
> has
> > any data. So, combining your items 2.5 and 3, you have a situation where
> > main input elements may be combined with only a speculative subset of the
> > side input data. They will not be reprocessed once more up-to-date side
> > input values become known. Beyond this initial period of waiting for the
> > very first firing of the side input window, there are no consistency
> > restrictions/guarantees on main input vs side input windows or
> triggerings.
> > It may be that for a given runner updating the side input with the new
> > value happens at high latency so all the main input elements are
> processed
> > and gone before the update goes through. It is a bit of a dangerous area
> > for users. I'm pretty interested in ideas in this space.
> >
> > Kenn
> >
>

Re: [DISCUSS] Adding Some Sort of SideInputRunner

Reply via email to