First of all, Thanks for the detailed explanation! I can say that from my point of view (as a runner developer) this is definitely confusing, especially discovering that an element in an empty window can be dropped at anytime, so +1 for Robert's comment on not having this public API, and according to Kenneth's lookup it looks like it's not entangled too deep.
So I guess #valueInGlobalWindow should be the "go-to" default window (as long as no "real" windows are involved), should we consider making this more clear in the public API ? maybe WindowedValue<T>#defaultValue(T) ? which will probably implement a global window.. just a thought. On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw <[email protected]> wrote: > As Thomas says, the fact that we ever produce values in "no window" is > an implementation quirk that should probably be fixed. (IIRC, it's > used for the output of a GBK before we've done the > group-also-by-windows to figure out what window it really should be > in, so "value in unknown windows" would be a better choice). > > If a WindowFn doesn't assign a value to any windows, the system is > free to drop it. There are pros and cons to supporting this degenerate > case vs. making it an error. However, this should almost certainly not > be in the public API... > > - Robert > > > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh <[email protected]> > wrote: > > Actually, my above claim isn't as strong as it can be. > > > > A value in no windows is considered to not exist. Values that are not > > assigned to any window can be dropped by a runner at *any time*. A > WindowFn > > *must* assign all elements to at least one window. All elements that are > > produced by any PTransform (including Sources) must be in a window, > > potentially the GlobalWindow. > > > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh <[email protected]> wrote: > > > >> Values should almost always be part of at least one window. WindowFns > >> should place all elements in at least one window, as values that are in > no > >> windows will be dropped when they reach a GroupByKey. > >> > >> Elements in no windows, for example those created by > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation > >> detail of a transform; for example, in the InProcessPipelineRunner, the > KV<K, > >> Iterable<WindowedValue<V>>> elements output by a GroupByKeyOnly are in > >> empty windows - but by the time the element reaches the boundary of the > >> GroupByKey, the elements are reassigned to the appropriate window(s). > >> > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela <[email protected]> > wrote: > >> > >>> My instinct tells me that if a value does not belong to a specific > window > >>> (in time) it's a part of a global window, but if so, what's the role of > >>> the > >>> "empty window". When should an element be a "value in an empty window" > ? > >>> > >> > >> >
