Andrew and others,

Thanks for your input.

> 
> 3. For functions (e.g. date_trunc(...)), I think the infrastructure for
> multiple type signatures exists, we just need implementations for different
> resolutions

Sure, we can add more type support in date_trunc() and other functions.

> 4. For converting back and forth from different resolutions in SQL Server
> 
> For 3, given there is seemingly no SQL standard way to do this, I suggest
> adding a new, non-postgres function like `convert_resolution` that performs
> the desired conversion. This function could be implemented today as a user
> defined function (UDF), and/or included as a built in function in
> DataFusion in the future.

What do folks think about adding functions like the following to Arrow?

* to_timestamp_millis(STRING) -> converts string timestamps to timestamps with 
millis resolution
* to_timestamp_millis(Timestamp(xxx)) -> converts or truncates other timestamp 
column types to millis resolution
* to_timestamp_micros(…)
* to_timestamp_seconds(…)

-Evan

> 
> Andrew
> 
> [1]  https://github.com/apache/arrow/pull/10005#discussion_r612551640 
> <https://github.com/apache/arrow/pull/10005#discussion_r612551640>
> 
> 
> 
> On Thu, Apr 15, 2021 at 4:41 PM Evan Chan <e...@urbanlogiq.com 
> <mailto:e...@urbanlogiq.com>> wrote:
> 
>> Hi folks,
>> 
>> So currently Arrow Rust/DataFusion supports four types of Timestamp
>> arrays, with Nano, Micro, Millisecond and Second resolution.  However, the
>> best supported by far are Nanos.  For example, in DataFusion, the following
>> only works for Nanos and not the other resolutions:
>> * CAST(x as TIMESTAMP)  -> Nanos only
>> * date_trunc()  -> nanos only
>> * filtering a timestamp array, eg my column >
>> to_timestamp(‘2020-06-30T12:00Z’)
>> 
>> In the broader SQL world, in general there seems to be only a single
>> Timestamp type in most databases, though in many cases there is a variable
>> resolution.
>> PostGres’s TIMESTAMP type is microsecond based:
>> https://www.postgresql.org/docs/9.1/datatype-datetime.html <
>> https://www.postgresql.org/docs/9.1/datatype-datetime.html 
>> <https://www.postgresql.org/docs/9.1/datatype-datetime.html>>
>> 
>> For UrbanLogiq, in some cases we would like to standardize on millisecond
>> resolution.  Nanoseconds yields only ~300 years of span for i64, which
>> isn’t enough for some applications.  At minimum, we’d like to get the
>> following working:
>> * Either something like CAST(x AS TIMESTAMP(Milliseconds)) or
>> date_trunc(‘milliseconds’, x) , which can cast different types of timestamp
>> arrays to Timestamp(Milliseconds, None)
>> * filtering that can compare, ideally, different types of timestamp
>> columns to a to_timestamp(….)
>> 
>> The last problem is easy to solve as the coercion logic can be fixed to
>> address that, but solving the first problem is not as straightforward.
>> - There isn’t any universal SQL standard for a type that supports
>> different timestamp resolutions
>> - The most non-intrusive way I can think of is:
>>    - CAST(x AS TIMESTAMP)   -> Nanos
>>    - CAST(x AS TIMESTAMP(Micros/Millis/Seconds)) ->.
>> Micros/Millis/Seconds arrays
>> - Functions are designed/work best with a single output type, and
>> date_trunc() is designed to output nanos only.  Fixing this would not be
>> trivial, it would probably require changing the signature of return_type()
>> so that return types can be determined from argument values, not just
>> argument types
>> 
>> Basically the larger question for the Arrow/DataFusion community is how do
>> we want to deal with supporting different timestamp types.  The ideal would
>> be that different functions work on all the timestamp types, but it’ll take
>> a long time to get there I fear.  Some possible directions:
>> 
>> - Continue current support for nanos as the “first class” citizen, but add
>> support for casting to different timestamp resolutions, and coercion to
>> nanos to work with different functions like date_trunc().
>>    - This means Arrow would not be usable in some cases.  Converting from
>> micros to nanos loses year range (for 64-bits anyways)
>>    - There is a performance penalty for the coercion
>> - Switch to different basis for the “base” timestamp type, like PostGres
>> did with micros
>> - Use more than 64 bits to represent nanos
>> - Add support for functions to work with different timestamp types
>>    - For example, cast() produces different timestamp resolutions
>>    - Other date functions like date_trunc() can input from different
>> resolutions
>>    - Signatures and return type calculation would be more complex
>> - Switch to a universal timestamp type which supports different
>> resolutions, as some SQL databases support
>> 
>> Thanks for your input,
>> Evan

Reply via email to