Hi, in streaming, side input for a window is considered ready as soon as at least one element is ready, this is the same for all kinds of side inputs, i.e. List, Map, Singleton. This means that successive main-input elements can see a different side input List if more side-input elements keep arriving. Side input is also never scoped to a key, but always global (broadcast), that is if you have a Map you get the whole Map<K, V> from your c.sideInput() call.
At least that's what I gathered from discussions on the ML with Kenneth. And that's why Stephan and I where wondering about the "correctness guarantees" that this gives and whether this is enough for most common use cases. Cheers, Aljoscha On Fri, 13 May 2016 at 14:44 Maximilian Michels <[email protected]> wrote: > Hi Stephan, > > As far as I understand side inputs, by definition, always need to be > "ready" before processing of any kind can start. What is considered > ready depends on the type of side input. If you use View.asList() or > View.asSingleton() then the whole side input needs to be materialized. > On the other hand, if you use View.asIterable(), processing can start > once the the first element arrives. > > If the side input itself is windowed, then the notion of "ready" only > applies to the individual windows. Side Input itself can also be > scoped by key if you use the View.asMap() or View.asMultimap() side > inputs views. > > From a quick look at the InProcessRunner it appears that processing > does not start until the side input of the window is ready. Beam > experts, please correct me if I got this wrong. > > Cheers, > Max > > On Fri, May 13, 2016 at 1:12 PM, Stephan Ewen <[email protected]> wrote: > > Hi! > > > > Aljoscha and me have been going through the side inputs quite a bit, and > we > > were wondering about the following: > > > > How does one properly join a static data set with a stream?. > > > > This sounds like a job for a side input, but would require that the side > > input materializes the initial static data before the main input can > begin > > processing. > > > > Given that the static data set is in a global window, and the Beam side > > inputs only wait for the first element in the window to be available, the > > main input would start joining against the side input prematurely. > > > > Is this simply considered an uncommon use case, or is there a way to > > realize this that we overlooked? > > > > Greetings, > > Stephan >
