On Tue, Mar 6, 2018 at 5:21 PM, Kenneth Knowles <k...@google.com> wrote:
> On Tue, Mar 6, 2018 at 1:06 PM Shen Li <cs.she...@gmail.com> wrote:
>> Should ParDo advance output watermarks based on only main input or all
>> inputs? Say if the watermark from a side input falls behind, should it
>> block the output watermark of the ParDo.
> The rule is that if the user's DoFn might output data with a timestamp,
> that timestamp should be a bound on the output watermark. For side inputs,
> I don't think this is the case. The readiness of the side input plus the
> info in the WindowMappingFn will determine which main elements must be
> pushed back, and this will bound the output watermark.
> The exception to the rule is that if data is behind the watermark it is
> "already late" it is OK to let the watermark advance because it doesn't
> make it "more late". Instead, then apply all the same holding rules to GC
> time so the data doesn't become droppable. The reason for this is that a
> large influx of late data could cause a backlog that prevents more recent
> data from achieving good latency.
> If there are pushed back elements, should the ParDo hold back its output
>> watermarks until corresponding pushed back elements are all processed?
> Yes, it should hold the watermark for these.