Andrew and others, Thanks for your input.
> > 3. For functions (e.g. date_trunc(...)), I think the infrastructure for > multiple type signatures exists, we just need implementations for different > resolutions Sure, we can add more type support in date_trunc() and other functions. > 4. For converting back and forth from different resolutions in SQL Server > > For 3, given there is seemingly no SQL standard way to do this, I suggest > adding a new, non-postgres function like `convert_resolution` that performs > the desired conversion. This function could be implemented today as a user > defined function (UDF), and/or included as a built in function in > DataFusion in the future. What do folks think about adding functions like the following to Arrow? * to_timestamp_millis(STRING) -> converts string timestamps to timestamps with millis resolution * to_timestamp_millis(Timestamp(xxx)) -> converts or truncates other timestamp column types to millis resolution * to_timestamp_micros(…) * to_timestamp_seconds(…) -Evan > > Andrew > > [1] https://github.com/apache/arrow/pull/10005#discussion_r612551640 > <https://github.com/apache/arrow/pull/10005#discussion_r612551640> > > > > On Thu, Apr 15, 2021 at 4:41 PM Evan Chan <e...@urbanlogiq.com > <mailto:e...@urbanlogiq.com>> wrote: > >> Hi folks, >> >> So currently Arrow Rust/DataFusion supports four types of Timestamp >> arrays, with Nano, Micro, Millisecond and Second resolution. However, the >> best supported by far are Nanos. For example, in DataFusion, the following >> only works for Nanos and not the other resolutions: >> * CAST(x as TIMESTAMP) -> Nanos only >> * date_trunc() -> nanos only >> * filtering a timestamp array, eg my column > >> to_timestamp(‘2020-06-30T12:00Z’) >> >> In the broader SQL world, in general there seems to be only a single >> Timestamp type in most databases, though in many cases there is a variable >> resolution. >> PostGres’s TIMESTAMP type is microsecond based: >> https://www.postgresql.org/docs/9.1/datatype-datetime.html < >> https://www.postgresql.org/docs/9.1/datatype-datetime.html >> <https://www.postgresql.org/docs/9.1/datatype-datetime.html>> >> >> For UrbanLogiq, in some cases we would like to standardize on millisecond >> resolution. Nanoseconds yields only ~300 years of span for i64, which >> isn’t enough for some applications. At minimum, we’d like to get the >> following working: >> * Either something like CAST(x AS TIMESTAMP(Milliseconds)) or >> date_trunc(‘milliseconds’, x) , which can cast different types of timestamp >> arrays to Timestamp(Milliseconds, None) >> * filtering that can compare, ideally, different types of timestamp >> columns to a to_timestamp(….) >> >> The last problem is easy to solve as the coercion logic can be fixed to >> address that, but solving the first problem is not as straightforward. >> - There isn’t any universal SQL standard for a type that supports >> different timestamp resolutions >> - The most non-intrusive way I can think of is: >> - CAST(x AS TIMESTAMP) -> Nanos >> - CAST(x AS TIMESTAMP(Micros/Millis/Seconds)) ->. >> Micros/Millis/Seconds arrays >> - Functions are designed/work best with a single output type, and >> date_trunc() is designed to output nanos only. Fixing this would not be >> trivial, it would probably require changing the signature of return_type() >> so that return types can be determined from argument values, not just >> argument types >> >> Basically the larger question for the Arrow/DataFusion community is how do >> we want to deal with supporting different timestamp types. The ideal would >> be that different functions work on all the timestamp types, but it’ll take >> a long time to get there I fear. Some possible directions: >> >> - Continue current support for nanos as the “first class” citizen, but add >> support for casting to different timestamp resolutions, and coercion to >> nanos to work with different functions like date_trunc(). >> - This means Arrow would not be usable in some cases. Converting from >> micros to nanos loses year range (for 64-bits anyways) >> - There is a performance penalty for the coercion >> - Switch to different basis for the “base” timestamp type, like PostGres >> did with micros >> - Use more than 64 bits to represent nanos >> - Add support for functions to work with different timestamp types >> - For example, cast() produces different timestamp resolutions >> - Other date functions like date_trunc() can input from different >> resolutions >> - Signatures and return type calculation would be more complex >> - Switch to a universal timestamp type which supports different >> resolutions, as some SQL databases support >> >> Thanks for your input, >> Evan