Thanks for bringing up this discussion @Luke.

As @Kenn mentioned, in Beam we have defined the constants value for the
min/max/end of global window. I noticed that
google.protobuf.Timestamp/Duration is only used in window definitions, such
as FixedWindowsPayload, SlidingWindowsPayload, SessionsPayload, etc.

I think that both RFC 3339 and Beam's current implementation are big enough
to express a common window definitions. But users can really define a
window size that outside the scope of the RFC 3339. Conceptually, we should
not limit the time range for window(although I think the range of RPC 3339
is big enough in most cases).

To ensure that people well know the background of the discussion, hope you
don't mind that I put the original conversion thread[1] here.

Best,
Jincheng

[1] https://github.com/apache/beam/pull/10041#discussion_r344380809

Robert Bradshaw <[email protected]> 于2019年11月12日周二 下午4:09写道:

> I agree about it being a tagged union in the model (together with
> actual_time(...) - epsilon). It's not just a performance hack though,
> it's also (as discussed elsewhere) a question of being able to find an
> embedding into existing datetime libraries. The real question here is
> whether we should limit ourselves to just these 10000 years AD, or
> find value in being able to process events for the lifetime of the
> universe (or, at least, recorded human history). Artificially limiting
> in this way would seem surprising to me at least.
>
> On Mon, Nov 11, 2019 at 11:58 PM Kenneth Knowles <[email protected]> wrote:
> >
> > The max timestamp, min timestamp, and end of the global window are all
> performance hacks in my view. Timestamps in beam are really a tagged union:
> >
> >     timestamp ::= min | max | end_of_global | actual_time(... some
> quantitative timestamp ...)
> >
> > with the ordering
> >
> >     min < actual_time(...) < end_of_global < max
> >
> > We chose arbitrary numbers so that we could do simple numeric
> comparisons and arithmetic.
> >
> > Kenn
> >
> > On Mon, Nov 11, 2019 at 2:03 PM Luke Cwik <[email protected]> wrote:
> >>
> >> While crites@ was investigating using protobuf to represent Apache
> Beam timestamps within the TestStreamEvents, he found out that the well
> known type google.protobuf.Timestamp doesn't support certain timestamps we
> were using in our tests (specifically the max timestamp that Apache Beam
> supports).
> >>
> >> This lead me to investigate and the well known type
> google.protobuf.Timestamp supports dates/times from 0001-01-01T00:00:00Z to
> 9999-12-31T23:59:59.999999999Z which is much smaller than the timestamp
> range that Apache Beam currently supports -9223372036854775ms to
> 9223372036854775ms which is about 292277BC to 294247AD (it was difficult to
> find a time range that represented this).
> >>
> >> Similarly the google.protobuf.Duration represents any time range over
> those ~10000 years. Google decided to limit their range to be compatible
> with the RFC 3339[2] standard to which does simplify many things since it
> guarantees that all RFC 3339 time parsing/manipulation libraries are
> supported.
> >>
> >> Should we:
> >> A) define our own timestamp/duration types to be able to represent the
> full time range that Apache Beam can express?
> >> B) limit the valid timestamps in Apache Beam to some standard such as
> RFC 3339?
> >>
> >> This discussion is somewhat related to the efforts to support nano
> timestamps[2].
> >>
> >> 1: https://tools.ietf.org/html/rfc3339
> >> 2:
> https://lists.apache.org/thread.html/86a4dcabdaa1dd93c9a55d16ee51edcff6266eda05221acbf9cf666d@%3Cdev.beam.apache.org%3E
>

Reply via email to