Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191.
On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela <[email protected]> wrote: > First of all, Thanks for the detailed explanation! > > I can say that from my point of view (as a runner developer) this is > definitely confusing, especially discovering that an element in an empty > window can be dropped at anytime, so +1 for Robert's comment on not having > this public API, and according to Kenneth's lookup it looks like it's not > entangled too deep. > > So I guess #valueInGlobalWindow should be the "go-to" default window (as > long as no "real" windows are involved), should we consider making this > more clear in the public API ? maybe WindowedValue<T>#defaultValue(T) ? > which will probably implement a global window.. just a thought. > > On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw > <[email protected]> > wrote: > > > As Thomas says, the fact that we ever produce values in "no window" is > > an implementation quirk that should probably be fixed. (IIRC, it's > > used for the output of a GBK before we've done the > > group-also-by-windows to figure out what window it really should be > > in, so "value in unknown windows" would be a better choice). > > > > If a WindowFn doesn't assign a value to any windows, the system is > > free to drop it. There are pros and cons to supporting this degenerate > > case vs. making it an error. However, this should almost certainly not > > be in the public API... > > > > - Robert > > > > > > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh <[email protected]> > > wrote: > > > Actually, my above claim isn't as strong as it can be. > > > > > > A value in no windows is considered to not exist. Values that are not > > > assigned to any window can be dropped by a runner at *any time*. A > > WindowFn > > > *must* assign all elements to at least one window. All elements that > are > > > produced by any PTransform (including Sources) must be in a window, > > > potentially the GlobalWindow. > > > > > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh <[email protected]> wrote: > > > > > >> Values should almost always be part of at least one window. WindowFns > > >> should place all elements in at least one window, as values that are > in > > no > > >> windows will be dropped when they reach a GroupByKey. > > >> > > >> Elements in no windows, for example those created by > > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation > > >> detail of a transform; for example, in the InProcessPipelineRunner, > the > > KV<K, > > >> Iterable<WindowedValue<V>>> elements output by a GroupByKeyOnly are in > > >> empty windows - but by the time the element reaches the boundary of > the > > >> GroupByKey, the elements are reassigned to the appropriate window(s). > > >> > > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela <[email protected]> > > wrote: > > >> > > >>> My instinct tells me that if a value does not belong to a specific > > window > > >>> (in time) it's a part of a global window, but if so, what's the role > of > > >>> the > > >>> "empty window". When should an element be a "value in an empty > window" > > ? > > >>> > > >> > > >> > > >
