Thanks everyone for their input; Interoperability would be the biggest issue; how much does C++ do with the timezone string?
-Evan > On Jul 7, 2021, at 1:33 PM, Weston Pace <weston.p...@gmail.com> wrote: > > I don't know about removal but you could probably ignore the timezone > string and it's not clear the issues would be that significant. > > If Rust never produces a non-null non-UTC timestamp then I don't see > that as an issue. > > If you are consuming data with a timestamp string other than UTC it > isn't really clear what information that timestamp string is supposed > to convey anyways. Are you supposed to extract fields as if you were > in that time zone? Or does this indicate the time zone the data was > captured in? Postgresql, etc. do not support this concept. Probably > the safest thing to do would be to reject the data. > > There still remains the question of whether or not you need to > distinguish between local times and instant times. Or, in python > terms, naive vs non-naive. Or, in parquet terms, whether you need to > worry about the isAdjustedToUtc flag. Or, in postgres terms, whether > you need to distinguish between "timestamp with timezone" and > "timestamp without timezone". > > This boils down to whether you want to support the constraints offered > by these semantic hints from the user or not. For example, forbidding > comparison between the two types of timestamps or altering how you > display them. If those features are not important, then Rust could > ignore the time zone field completely. That could cause an > interoperability issue though (e.g. data going into rust with timezone > UTC comes back out with no timezone even though nothing changed). > Ideally rust could ignore the time zone string but leave it unchanged. > > On Wed, Jul 7, 2021 at 6:58 AM Joris Van den Bossche > <jorisvandenboss...@gmail.com> wrote: >> >> On Wed, 7 Jul 2021 at 18:46, Jorge Cardoso Leitão <jorgecarlei...@gmail.com> >> wrote: >> >>> Hi, >>> >>> AFAIK timezone is part of the spec. >> >> >> And for reference, the current spec (Schema flatbuffer file) for timestamp >> is at >> https://github.com/apache/arrow/blob/6c8d30ea82222fd2750b999840872d3f6cbdc8f8/format/Schema.fbs#L217-L247. >> >> >> >>> In Python, that would be [1] >>> >>> import pyarrow as pa >>> dt1 = pa.timestamp("ms", "+00:10") >>> dt2 = pa.timestamp("ms") >>> >>> arrow-rs is not very consistent with how it handles it. imo that is an >>> artifact of being currently difficult (API wise) to create an array with a >>> timezone, which have caused people to not use it much (and thus not >>> implement kernels with it / test it properly). >>> >>> I do not see how removing it would be compatible with the Arrow spec, >>> though. >>> >>> Best, >>> Jorge >>> >>> [1] https://arrow.apache.org/docs/python/generated/pyarrow.timestamp.html >>> >>> >>> >>> On Wed, Jul 7, 2021 at 6:37 PM Evan Chan <e...@urbanlogiq.com> wrote: >>> >>>> Hi folks, >>>> >>>> Some of us are having a discussion about a direction change for Rust >>> Arrow >>>> timestamp types, which current support both a resolution field (Ns, >>> Micros, >>>> Ms, Seconds) similar to the other language implementations, but also >>>> optionally a timezone string field. I believe the timezone field is >>>> unique to the Rust implementation, as I don’t find it in the C/C++ and >>>> Python docs. At the same time, in reality if the timezone field is non >>>> null, this is not well supported at all in the current code. Functions >>>> returning timestamps pretty much all return a null timezone, for example, >>>> and don’t allow the timezone to be specified. >>>> >>>> The proposal would be to eliminate the timezone field and bring the Rust >>>> Arrow timestamp type in line with that of the other language >>>> implementations, also simplifying implementation. It seems this is in >>>> line with direction of other projects (Parquet, Spark, and most DBs have >>>> timestamp types which do not have explicit timezones or are implicitly >>> UTC). >>>> >>>> Please feel free to see >>>> https://github.com/apache/arrow-datafusion/issues/686 < >>>> https://github.com/apache/arrow-datafusion/issues/686> >>>> (Or would it be better to discuss here in mailing list?) >>>> >>>> Cheers! >>>> Evan >>>