Re: Flatten input data streams with skewed watermark progress

Reuven Lax Mon, 12 Mar 2018 15:47:29 -0700

Logically a Flatten is just a way to create a multi-input transform
downstream of the flatten (you can imagine a model in which Flatten was not
explicit, we just allowed multiple main inputs). This means that yes, the
watermark is the minimum of all inputs.


I don't see how a late tuple can become early. Can you explain?


On Mon, Mar 12, 2018 at 2:07 PM Shen Li <[email protected]> wrote:

> Hi Reuven,
>
> What about watermark? Should Flatten emit the min watermark of all input
> data streams? If that is the case, one late tuple can become early after
> Flatten, right? Will that cause any problem?
>
> Shen
>
> On Mon, Mar 12, 2018 at 4:09 PM, Reuven Lax <[email protected]> wrote:
>
>> No, I don't think it makes sense for the Flatten operator to cache
>> element.
>>
>>
>> On Mon, Mar 12, 2018 at 11:55 AM Shen Li <[email protected]> wrote:
>>
>>> If multiple inputs of Flatten proceed at different speeds, should the
>>> Flatten operator cache tuples before emitting output watermarks? This can
>>> prevent a late tuple from becoming early. But if the watermark gap (i.e.,
>>> cache size) becomes too large among inputs, can the application tell
>>> Beam/runner to emit output watermark anyway and consider slow input tuples
>>> as late?
>>>
>>> Thanks,
>>> Shen
>>>
>>
>

Re: Flatten input data streams with skewed watermark progress

Reply via email to