As you said, this would be update incompatible across all streaming pipelines. At the very least this would be a big problem for Dataflow users, and I believe many Flink users as well. I'm not sure the benefit here justifies causing problems for so many users.
Reuven On Wed, Nov 7, 2018 at 4:56 PM Robert Bradshaw <rober...@google.com> wrote: > Yes, microseconds is a good compromise for covering a long enough > timespan that there's little reason it could be hit (even for > processing historical data). > > Regarding backwards compatibility, could we just change the internal > representation of Beam's element timestamps, possibly with new APIs to > access the finer granularity? (True, it may not be upgrade > compatible.) > On Tue, Nov 6, 2018 at 8:46 PM Reuven Lax <re...@google.com> wrote: > > > > The main difference (though possibly theoretical) is when time runs out. > With 64 bits and nanosecond precision, we can only represent times about > 244 years in the future (or the past). > > > > On Tue, Nov 6, 2018 at 11:30 AM Kenneth Knowles <k...@apache.org> wrote: > >> > >> I like nanoseconds as extremely future-proof. What about specing this > out in stages (1) domain of values (2) portable encoding that can represent > those values (3) language-specific types to embed the values in. > >> > >> 1. If it is a nanosecond-precision absolute time, and we eventually > want to migrate event time timestamps to match, then we need values for > "end of global window" and "end of time". TBH I am not sure we need both of > these any more. We can either define a max on the nanosecond range or > create distinguished values. > >> > >> 2. For portability, presumably an order-preserving integer encoding of > nanoseconds since epoch with whatever tweaks to allow for representing the > end of time. It might be useful to find a way to allow multiple. Not super > useful at a particular version, but might have given us a migration path. > It would also allow experiments for performance. > >> > >> 3. We could probably find a way to keep user-facing API compatibility > here while increasing underlying precision at 1 and 2, but I probably not > worth it. A new Java type IMO addresses the lossiness issue because a user > would have to explicitly request truncation to assign to a millis event > time timestamp. > >> > >> Kenn > >> > >> On Tue, Nov 6, 2018 at 12:55 AM Charles Chen <c...@google.com> wrote: > >>> > >>> Is the proposal to do this for both Beam Schema DATETIME fields as > well as for Beam timestamps in general? The latter likely has a bunch of > downstream consequences for all runners. > >>> > >>> On Tue, Nov 6, 2018 at 12:38 AM Ismaël Mejía <ieme...@gmail.com> > wrote: > >>>> > >>>> +1 to more precision even to the nano level, probably via Reuven's > >>>> proposal of a different internal representation. > >>>> On Tue, Nov 6, 2018 at 9:19 AM Robert Bradshaw <rober...@google.com> > wrote: > >>>> > > >>>> > +1 to offering more granular timestamps in general. I think it will > be > >>>> > odd if setting the element timestamp from a row DATETIME field is > >>>> > lossy, so we should seriously consider upgrading that as well. > >>>> > On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <c...@google.com> wrote: > >>>> > > > >>>> > > One related issue that came up before is that we (perhaps > unnecessarily) restrict the precision of timestamps in the Python SDK to > milliseconds because of legacy reasons related to the Java runner's use of > Joda time. Perhaps Beam portability should natively use a more granular > timestamp unit. > >>>> > > > >>>> > > On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruw...@google.com> > wrote: > >>>> > >> > >>>> > >> Thanks Reuven! > >>>> > >> > >>>> > >> I think Reuven gives the third option: > >>>> > >> > >>>> > >> Change internal representation of DATETIME field in Row. Still > keep public ReadableDateTime getDateTime(String fieldName) API to be > compatible with existing code. And I think we could add one more API to > getDataTimeNanosecond. This option is different from the option one because > option one actually maintains two implementation of time. > >>>> > >> > >>>> > >> -Rui > >>>> > >> > >>>> > >> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <re...@google.com> > wrote: > >>>> > >>> > >>>> > >>> I would vote that we change the internal representation of Row > to something other than Joda. Java 8 times would give us at least > microseconds, and if we want nanoseconds we could simply store it as a > number. > >>>> > >>> > >>>> > >>> We should still keep accessor methods that return and take Joda > objects, as the rest of Beam still depends on Joda. > >>>> > >>> > >>>> > >>> Reuven > >>>> > >>> > >>>> > >>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruw...@google.com> > wrote: > >>>> > >>>> > >>>> > >>>> Hi Community, > >>>> > >>>> > >>>> > >>>> The DATETIME field in Beam Schema/Row is implemented by Joda's > Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is limited > to the precision of millisecond. It has good enough precision to represent > timestamp of event time, but it is not enough for the real "time" data. For > the "time" type data, we probably need to support even up to the precision > of nanosecond. > >>>> > >>>> > >>>> > >>>> Unfortunately, Joda decided to keep the precision of > millisecond: https://github.com/JodaOrg/joda-time/issues/139. > >>>> > >>>> > >>>> > >>>> If we want to support the precision of nanosecond, we could > have two options: > >>>> > >>>> > >>>> > >>>> Option one: utilize current FieldType's metadata field, such > that we could set something into meta data and Row could check the metadata > to decide what's saved in DATETIME field: Joda's Datetime or an > implementation that supports nanosecond. > >>>> > >>>> > >>>> > >>>> Option two: have another field (maybe called TIMESTAMP > field?), to have an implementation to support higher precision of time. > >>>> > >>>> > >>>> > >>>> What do you think about the need of higher precision for time > type and which option is preferred? > >>>> > >>>> > >>>> > >>>> -Rui >