Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191.

On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela <[email protected]> wrote:

> First of all, Thanks for the detailed explanation!
>
> I can say that from my point of view (as a runner developer) this is
> definitely confusing, especially discovering that an element in an empty
> window can be dropped at anytime, so +1 for Robert's comment on not having
> this public API, and according to Kenneth's lookup it looks like it's not
> entangled too deep.
>
> So I guess #valueInGlobalWindow should be the "go-to" default window (as
> long as no "real" windows are involved), should we consider making this
> more clear in the public API ? maybe WindowedValue<T>#defaultValue(T) ?
> which will probably implement a global window.. just a thought.
>
> On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw
> <[email protected]>
> wrote:
>
> > As Thomas says, the fact that we ever produce values in "no window" is
> > an implementation quirk that should probably be fixed. (IIRC, it's
> > used for the output of a GBK before we've done the
> > group-also-by-windows to figure out what window it really should be
> > in, so "value in unknown windows" would be a better choice).
> >
> > If a WindowFn doesn't assign a value to any windows, the system is
> > free to drop it. There are pros and cons to supporting this degenerate
> > case vs. making it an error. However, this should almost certainly not
> > be in the public API...
> >
> > - Robert
> >
> >
> > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh <[email protected]>
> > wrote:
> > > Actually, my above claim isn't as strong as it can be.
> > >
> > > A value in no windows is considered to not exist. Values that are not
> > > assigned to any window can be dropped by a runner at *any time*. A
> > WindowFn
> > > *must* assign all elements to at least one window. All elements that
> are
> > > produced by any PTransform (including Sources) must be in a window,
> > > potentially the GlobalWindow.
> > >
> > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh <[email protected]> wrote:
> > >
> > >> Values should almost always be part of at least one window. WindowFns
> > >> should place all elements in at least one window, as values that are
> in
> > no
> > >> windows will be dropped when they reach a GroupByKey.
> > >>
> > >> Elements in no windows, for example those created by
> > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation
> > >> detail of a transform; for example, in the InProcessPipelineRunner,
> the
> > KV<K,
> > >> Iterable<WindowedValue<V>>> elements output by a GroupByKeyOnly are in
> > >> empty windows - but by the time the element reaches the boundary of
> the
> > >> GroupByKey, the elements are reassigned to the appropriate window(s).
> > >>
> > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela <[email protected]>
> > wrote:
> > >>
> > >>> My instinct tells me that if a value does not belong to a specific
> > window
> > >>> (in time) it's a part of a global window, but if so, what's the role
> of
> > >>> the
> > >>> "empty window". When should an element be a "value in an empty
> window"
> > ?
> > >>>
> > >>
> > >>
> >
>

Reply via email to