I agree about it being a tagged union in the model (together with
actual_time(...) - epsilon). It's not just a performance hack though,
it's also (as discussed elsewhere) a question of being able to find an
embedding into existing datetime libraries. The real question here is
whether we should limit ourselves to just these 10000 years AD, or
find value in being able to process events for the lifetime of the
universe (or, at least, recorded human history). Artificially limiting
in this way would seem surprising to me at least.

On Mon, Nov 11, 2019 at 11:58 PM Kenneth Knowles <[email protected]> wrote:
>
> The max timestamp, min timestamp, and end of the global window are all 
> performance hacks in my view. Timestamps in beam are really a tagged union:
>
>     timestamp ::= min | max | end_of_global | actual_time(... some 
> quantitative timestamp ...)
>
> with the ordering
>
>     min < actual_time(...) < end_of_global < max
>
> We chose arbitrary numbers so that we could do simple numeric comparisons and 
> arithmetic.
>
> Kenn
>
> On Mon, Nov 11, 2019 at 2:03 PM Luke Cwik <[email protected]> wrote:
>>
>> While crites@ was investigating using protobuf to represent Apache Beam 
>> timestamps within the TestStreamEvents, he found out that the well known 
>> type google.protobuf.Timestamp doesn't support certain timestamps we were 
>> using in our tests (specifically the max timestamp that Apache Beam 
>> supports).
>>
>> This lead me to investigate and the well known type 
>> google.protobuf.Timestamp supports dates/times from 0001-01-01T00:00:00Z to 
>> 9999-12-31T23:59:59.999999999Z which is much smaller than the timestamp 
>> range that Apache Beam currently supports -9223372036854775ms to 
>> 9223372036854775ms which is about 292277BC to 294247AD (it was difficult to 
>> find a time range that represented this).
>>
>> Similarly the google.protobuf.Duration represents any time range over those 
>> ~10000 years. Google decided to limit their range to be compatible with the 
>> RFC 3339[2] standard to which does simplify many things since it guarantees 
>> that all RFC 3339 time parsing/manipulation libraries are supported.
>>
>> Should we:
>> A) define our own timestamp/duration types to be able to represent the full 
>> time range that Apache Beam can express?
>> B) limit the valid timestamps in Apache Beam to some standard such as RFC 
>> 3339?
>>
>> This discussion is somewhat related to the efforts to support nano 
>> timestamps[2].
>>
>> 1: https://tools.ietf.org/html/rfc3339
>> 2: 
>> https://lists.apache.org/thread.html/86a4dcabdaa1dd93c9a55d16ee51edcff6266eda05221acbf9cf666d@%3Cdev.beam.apache.org%3E

Reply via email to