Thanks for bringing up this discussion @Luke. As @Kenn mentioned, in Beam we have defined the constants value for the min/max/end of global window. I noticed that google.protobuf.Timestamp/Duration is only used in window definitions, such as FixedWindowsPayload, SlidingWindowsPayload, SessionsPayload, etc.
I think that both RFC 3339 and Beam's current implementation are big enough to express a common window definitions. But users can really define a window size that outside the scope of the RFC 3339. Conceptually, we should not limit the time range for window(although I think the range of RPC 3339 is big enough in most cases). To ensure that people well know the background of the discussion, hope you don't mind that I put the original conversion thread[1] here. Best, Jincheng [1] https://github.com/apache/beam/pull/10041#discussion_r344380809 Robert Bradshaw <[email protected]> 于2019年11月12日周二 下午4:09写道: > I agree about it being a tagged union in the model (together with > actual_time(...) - epsilon). It's not just a performance hack though, > it's also (as discussed elsewhere) a question of being able to find an > embedding into existing datetime libraries. The real question here is > whether we should limit ourselves to just these 10000 years AD, or > find value in being able to process events for the lifetime of the > universe (or, at least, recorded human history). Artificially limiting > in this way would seem surprising to me at least. > > On Mon, Nov 11, 2019 at 11:58 PM Kenneth Knowles <[email protected]> wrote: > > > > The max timestamp, min timestamp, and end of the global window are all > performance hacks in my view. Timestamps in beam are really a tagged union: > > > > timestamp ::= min | max | end_of_global | actual_time(... some > quantitative timestamp ...) > > > > with the ordering > > > > min < actual_time(...) < end_of_global < max > > > > We chose arbitrary numbers so that we could do simple numeric > comparisons and arithmetic. > > > > Kenn > > > > On Mon, Nov 11, 2019 at 2:03 PM Luke Cwik <[email protected]> wrote: > >> > >> While crites@ was investigating using protobuf to represent Apache > Beam timestamps within the TestStreamEvents, he found out that the well > known type google.protobuf.Timestamp doesn't support certain timestamps we > were using in our tests (specifically the max timestamp that Apache Beam > supports). > >> > >> This lead me to investigate and the well known type > google.protobuf.Timestamp supports dates/times from 0001-01-01T00:00:00Z to > 9999-12-31T23:59:59.999999999Z which is much smaller than the timestamp > range that Apache Beam currently supports -9223372036854775ms to > 9223372036854775ms which is about 292277BC to 294247AD (it was difficult to > find a time range that represented this). > >> > >> Similarly the google.protobuf.Duration represents any time range over > those ~10000 years. Google decided to limit their range to be compatible > with the RFC 3339[2] standard to which does simplify many things since it > guarantees that all RFC 3339 time parsing/manipulation libraries are > supported. > >> > >> Should we: > >> A) define our own timestamp/duration types to be able to represent the > full time range that Apache Beam can express? > >> B) limit the valid timestamps in Apache Beam to some standard such as > RFC 3339? > >> > >> This discussion is somewhat related to the efforts to support nano > timestamps[2]. > >> > >> 1: https://tools.ietf.org/html/rfc3339 > >> 2: > https://lists.apache.org/thread.html/86a4dcabdaa1dd93c9a55d16ee51edcff6266eda05221acbf9cf666d@%3Cdev.beam.apache.org%3E >
