velvia commented on issue #686:
URL: 
https://github.com/apache/arrow-datafusion/issues/686#issuecomment-885231416


   Hi folks,
   
   Excellent discussion; I need to go back and reread the detailed scenarios 
that Andrew and others laid out.
   
   I did propose a second argument to the to_timestamp() functions, though I’m 
slightly leaning towards having separate functions, a separate one specifically 
for timezone conversion or designation - mainly for clarity.
   
   To help us think through it though, let’s take the default case where 
`to_timestamp()` does NOT have a second argument, which most people (definitely 
those coming from other SQL dialects) would omit.
   
   Arrow/DataFusion must know the output type currently, so we have to pick 
one.  Say it is Timestamp(_, UTC).  Would this be correct?  (UTC seems to be 
the only 
   
   The cases are:
   - string, with no clear timezone designation.  In this case, local (server) 
timezone might be picked to interpret, and then the results could be converted 
to be UTC-based timestamp.  There is ambiguity though, but one can argue if the 
original string was UTC it should have “Z” or appropriate offset anyways.
   - string, with timezone designation (+/- offset, or Z).  In this case we get 
the timezone from the string, we could convert output to be UTC based.
   - int64/uint64.  In this case, this is a numeric timestamp from epoch, but 
there is again ambiguity.  Do we treat it as local timezone based or UTC based?
   - Timestamp column.  If the source is already a timestamp column, the user 
is looking to probably convert units (say mills to nanos) and preserve the 
timezone.  This is a case where having a fixed output type does not really work 
well, but for this case we can actually specify that DataFusion produce an 
array with the same timezone output as the input, so it’s fine.
   
   I apologize in advance if I’m rehashing discussions from earlier in the 
thread still need to catch up.
   
   > On Jul 22, 2021, at 12:20 PM, Andrew Lamb ***@***.***> wrote:
   > 
   > 
   > TO_TIMESTAMP_UTC
   > TO_TIMESTAMP_LOCALTIME
   > Yes, this is what I thought @velvia <https://github.com/velvia> was more 
or less proposing, but instead of those names, use a second optional argument
   > 
   > to_timestamp(.., 'UTC') --> TO_TIMESTAMP_UTC(..)
   > to_timestamp(..) --> TO_TIMESTAMP_LOCALTIME(..)
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
<https://github.com/apache/arrow-datafusion/issues/686#issuecomment-885171173>, 
or unsubscribe 
<https://github.com/notifications/unsubscribe-auth/AAIDPW4W353SLEIUIWPPZMLTZBVOTANCNFSM473NSWHA>.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to