Hi Kenn, Thank you!
Shen On Tue, Mar 6, 2018 at 5:21 PM, Kenneth Knowles <[email protected]> wrote: > On Tue, Mar 6, 2018 at 1:06 PM Shen Li <[email protected]> wrote: > >> Hi, >> >> Should ParDo advance output watermarks based on only main input or all >> inputs? Say if the watermark from a side input falls behind, should it >> block the output watermark of the ParDo. >> > > The rule is that if the user's DoFn might output data with a timestamp, > that timestamp should be a bound on the output watermark. For side inputs, > I don't think this is the case. The readiness of the side input plus the > info in the WindowMappingFn will determine which main elements must be > pushed back, and this will bound the output watermark. > > The exception to the rule is that if data is behind the watermark it is > "already late" it is OK to let the watermark advance because it doesn't > make it "more late". Instead, then apply all the same holding rules to GC > time so the data doesn't become droppable. The reason for this is that a > large influx of late data could cause a backlog that prevents more recent > data from achieving good latency. > > If there are pushed back elements, should the ParDo hold back its output >> watermarks until corresponding pushed back elements are all processed? >> > > Yes, it should hold the watermark for these. > > Kenn > > >> >> Thanks, >> Shen >> >
