[ 
https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426142#comment-16426142
 ] 

Jesus Camacho Rodriguez commented on HIVE-12192:
------------------------------------------------

bq. How much similar? Is it really the same as SQL (standard) timestamp, i.e. a 
"Record" of year/month/day/hour/minute/second[/fraction] ? Did this semantics 
change over time?
They should be equal, and this has not changed for some time.
As I mentioned, for end-user there should not be a visible difference with this 
patch, except for bugs such as the one mentioned in the description of the 
issue. Other complicated scenarios may be fixed with this patch too, e.g. query 
execution across multiple clusters with different timezones, but I am not sure 
this is something that is supported by Hive right now in any case.

bq. Did you also consider using a different representation, like 
java.time.LocalDateTime? (if this representation is indeed applicable)
This is precisely what the patch attached to this issue is doing, you can check 
it above.

bq. Do you happen to know it is handled for other file types? Parquet, RC 
binary, RC text, textfile?
If I remember correctly, for text based formats, the string representation is 
persisted, e.g. '1970-01-01 00:00:00'. I do not remember how other formats 
handle the mismatch, but if they use a long representation, I would expect that 
they transform the timestamp from system time zone to UTC when they write to 
disk from Hive, and then from UTC to current system time zone when they read 
from disk into Hive.


> Hive should carry out timestamp computations in UTC
> ---------------------------------------------------
>
>                 Key: HIVE-12192
>                 URL: https://issues.apache.org/jira/browse/HIVE-12192
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Ryan Blue
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: timestamp
>         Attachments: HIVE-12192.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to