A side input window becomes "ready" as soon as a trigger has fired producing data within the PCollection which the side input view would be over.
For example, if the side input window is in the global window with an after watermark trigger, it will fire once when all the data has been processed along the side input path since the watermark will go from negative infinity to positive infinity. This is the canonical way of how to load a static dataset to use as a side input for streaming. Generally, the main input will need to block till at least one pane has been output into the side input PCollection. On Fri, May 13, 2016 at 7:01 AM, Aljoscha Krettek <[email protected]> wrote: > Hi, > in streaming, side input for a window is considered ready as soon as at > least one element is ready, this is the same for all kinds of side inputs, > i.e. List, Map, Singleton. This means that successive main-input elements > can see a different side input List if more side-input elements keep > arriving. Side input is also never scoped to a key, but always global > (broadcast), that is if you have a Map you get the whole Map<K, V> from > your c.sideInput() call. > > At least that's what I gathered from discussions on the ML with Kenneth. > And that's why Stephan and I where wondering about the "correctness > guarantees" that this gives and whether this is enough for most common use > cases. > > Cheers, > Aljoscha > > On Fri, 13 May 2016 at 14:44 Maximilian Michels <[email protected]> wrote: > > > Hi Stephan, > > > > As far as I understand side inputs, by definition, always need to be > > "ready" before processing of any kind can start. What is considered > > ready depends on the type of side input. If you use View.asList() or > > View.asSingleton() then the whole side input needs to be materialized. > > On the other hand, if you use View.asIterable(), processing can start > > once the the first element arrives. > > > > If the side input itself is windowed, then the notion of "ready" only > > applies to the individual windows. Side Input itself can also be > > scoped by key if you use the View.asMap() or View.asMultimap() side > > inputs views. > > > > From a quick look at the InProcessRunner it appears that processing > > does not start until the side input of the window is ready. Beam > > experts, please correct me if I got this wrong. > > > > Cheers, > > Max > > > > On Fri, May 13, 2016 at 1:12 PM, Stephan Ewen <[email protected]> wrote: > > > Hi! > > > > > > Aljoscha and me have been going through the side inputs quite a bit, > and > > we > > > were wondering about the following: > > > > > > How does one properly join a static data set with a stream?. > > > > > > This sounds like a job for a side input, but would require that the > side > > > input materializes the initial static data before the main input can > > begin > > > processing. > > > > > > Given that the static data set is in a global window, and the Beam side > > > inputs only wait for the first element in the window to be available, > the > > > main input would start joining against the side input prematurely. > > > > > > Is this simply considered an uncommon use case, or is there a way to > > > realize this that we overlooked? > > > > > > Greetings, > > > Stephan > > >
