The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which is
equivalent to -2**63 microseconds. We also occasionally refer to this
timestamp as "negative infinity".

The default watermark policy for a bounded source should be negative
infinity until all of the data is read, then positive infinity. There isn't
really a default watermark policy for an unbounded source - this is
dependent on the data that hasn't been read from that source, so it's
dependent on where you're reading from.

Currently, modifying the timestamp of an element from within a DoFn does
not modify the watermark; modifying a timestamp forwards in time is
generally "safe", as it can't cause data to move to behind the watermark -
this is why moving elements backwards in time requires setting
"withAllowedTimestampSkew" (which also doesn't modify the watermark, which
means that elements that are moved backwards in time can become late and be
dropped by a runner). I don't think we currently have any changes in-flight
to make this configurable.

On Wed, Jan 25, 2017 at 9:24 PM, Shen Li <[email protected]> wrote:

> Hi,
>
> When reading from a source with no timestamp specified on elements, what
> should be the default timestamp? I presume that it should be 0 as I saw
> PAssertTest trying to set timestamps to very small values with 0 allowed
> timestamp skew. Is that right?
>
> What about the default watermark policy?
>
> If a ParDo modifies the timestamp using
> DoFnProcessContext.outputWithTimestamp, how should that affect the output
> watermark? Say the ParDo adds 100 seconds to the timestamp of each element
> in processElement, how could the runner know it should also add 100 seconds
> to output timestamps?
>
> Thanks,
>
> Shen
>

Reply via email to