velvia commented on issue #686: URL: https://github.com/apache/arrow-datafusion/issues/686#issuecomment-885231416
Hi folks, Excellent discussion; I need to go back and reread the detailed scenarios that Andrew and others laid out. I did propose a second argument to the to_timestamp() functions, though I’m slightly leaning towards having separate functions, a separate one specifically for timezone conversion or designation - mainly for clarity. To help us think through it though, let’s take the default case where `to_timestamp()` does NOT have a second argument, which most people (definitely those coming from other SQL dialects) would omit. Arrow/DataFusion must know the output type currently, so we have to pick one. Say it is Timestamp(_, UTC). Would this be correct? (UTC seems to be the only The cases are: - string, with no clear timezone designation. In this case, local (server) timezone might be picked to interpret, and then the results could be converted to be UTC-based timestamp. There is ambiguity though, but one can argue if the original string was UTC it should have “Z” or appropriate offset anyways. - string, with timezone designation (+/- offset, or Z). In this case we get the timezone from the string, we could convert output to be UTC based. - int64/uint64. In this case, this is a numeric timestamp from epoch, but there is again ambiguity. Do we treat it as local timezone based or UTC based? - Timestamp column. If the source is already a timestamp column, the user is looking to probably convert units (say mills to nanos) and preserve the timezone. This is a case where having a fixed output type does not really work well, but for this case we can actually specify that DataFusion produce an array with the same timezone output as the input, so it’s fine. I apologize in advance if I’m rehashing discussions from earlier in the thread still need to catch up. > On Jul 22, 2021, at 12:20 PM, Andrew Lamb ***@***.***> wrote: > > > TO_TIMESTAMP_UTC > TO_TIMESTAMP_LOCALTIME > Yes, this is what I thought @velvia <https://github.com/velvia> was more or less proposing, but instead of those names, use a second optional argument > > to_timestamp(.., 'UTC') --> TO_TIMESTAMP_UTC(..) > to_timestamp(..) --> TO_TIMESTAMP_LOCALTIME(..) > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub <https://github.com/apache/arrow-datafusion/issues/686#issuecomment-885171173>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIDPW4W353SLEIUIWPPZMLTZBVOTANCNFSM473NSWHA>. > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
