alamb commented on issue #7697: URL: https://github.com/apache/arrow-datafusion/issues/7697#issuecomment-1741131533
@mhilton and I just had chat and here are some notes: The way the code currently works is that `date_bin` lists several signatures it can handle, such as https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/built_in_function.rs#L1041-L1046 If the user specifies arguments to that function as, eg, a String that doesn't match the required types, the datafusion coercion logic kicks in and adds casts on the specific arguments to make them conform to one of the available signatures. The current convention is that `Timestamp(Second, Some("+TZ")))` effectively means "any `Timestamp` type with a timezone, and then the implementation of the function (in this case `date_bin`) must handle any possible timezone that comes in. The problem with the current convention is that the coercion rules for `Timestamp(Second, Some("+TZ")))` https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L222-L226 Only Support casting from another timestamp, not from Strings, the way `Timestamp(Second, None))` does: https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L209-L216 Thus I think the solution @mhilton and I brainstormed is to change the signature to `Timestamp(Second, Some("+00:00)))` (aka only accept UTC timestamps) and teach the coercion logic to cast all argument to that time. Then the implementation of `date_bin` only needs to handle one timezone (UTC) cc @wiedld -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
