It is better, and it also fits the following example.

-Rui

On Thu, Jan 17, 2019 at 11:14 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> How about: "Once the watermark progresses past the end of a window, any
> further elements that arrive with a timestamp in that window are considered
> late data."
>
> On Thu, Jan 17, 2019 at 1:43 PM Rui Wang <ruw...@google.com> wrote:
>
>> Hi Community,
>>
>> In Beam programming guide [1], there is a sentence: "Data that arrives
>> with a timestamp after the watermark is considered *late data*"
>>
>> Seems like people get confused by it. For example, see Stackoverflow
>> comment [2]. Basically it makes people think that a event timestamp that is
>> bigger than watermark is considered late (due to that "after").
>>
>> Although there is a example right after this sentence to explain late
>> data, seems to me that this sentence is incomplete. The complete sentence
>> to me can be: "The watermark consistently advances from -inf to +inf. Data
>> that arrives with a timestamp after the watermark is considered late data."
>>
>> Am I understand correctly? Is there better description for the order of
>> late data and watermark? I would happy to send PR to update Beam
>> documentation.
>>
>> -Rui
>>
>> [1]: https://beam.apache.org/documentation/programming-guide/#windowing
>> [2]:
>> https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971
>>
>>
>>

Reply via email to