[ 
https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426302#comment-16426302
 ] 

Haozhun Jin edited comment on HIVE-12192 at 4/4/18 11:02 PM:
-------------------------------------------------------------

[~jcamachorodriguez], thank you for your patient answer. Please bear with us 
for a little bit more.

What do you think after reading the two experiments below?
h2. Experiment 1

I conducted this experiment once before. And this is why I'm under the 
impression that Hive Timestamp type means Instant. I redid the experiment today 
in both 1.2.1 and 2.6.3. The result is the same. Zone is America/Los_Angeles.

The table below summarizes the outcome: 
[raw|https://gist.github.com/haozhun/03cd09b3fa2456271f2e01759c9c1b8e]

 
||Query||Actual||expected LocalDateTime||expect Instant||
|current_timestamp()|2018-04-04 14:51|2018-04-04 14:51|2018-04-04 14:51|
|current_timestamp()
- interval '2880' hour|2017-12-05 13:51|2018-12-05 14:51|2017-12-05 13:51|
h2. Experiment 2

After reading your comments about ORC, I conducted some experiments to see how 
other file formats behave. The result is not what I was expecting. I did the 
experiment in 2.6.3.

The table below summarizes the outcome: 
[raw|[https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198]|https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198])]

 
||(server +05:45)||ORC||RCBinary||RCText||Text||
|Insert (client -07:00)|06:00:00|06:00:00|06:00:00|06:00:00|
|Read 1 (client -07:00)|18:45:00|06:00:00|18:45:00|18:45:00|
|Read 2 (client -04:00)|18:45:00|09:00:00|18:45:00|18:45:00|
h2. Summary

This is my take away
 * experiment 1: Hive timestamp type has Instant semantics. If it's internal 
representation is changed from java.sql.Timestamp to java.time.LocalDateTime, 
it will be a user-visible behavior change.
 * experiment 2: Do not use Hive in a zone different from the server's (given 
insert and read does not round trip in a single hive cli session). Hopefully, 
that's indeed how every one uses Hive. In that case, it does not matter whether 
experiment 2 indicates Instant or LocalDateTime.

 


was (Author: haozhun):
[~jcamachorodriguez], thank you for your patient answer. Please bear with us 
for a little bit more.

What do you think after reading the two experiments below?
h2. Experiment 1

I conducted this experiment once before. And this is why I'm under the 
impression that Hive Timestamp type means Instant. I redid the experiment today 
in both 1.2.1 and 2.6.3. The result is the same. Zone is America/Los_Angeles.

The table below summarizes the outcome: 
[raw|https://gist.github.com/haozhun/03cd09b3fa2456271f2e01759c9c1b8e]

 
||Query||Actual||expected LocalDateTime||expect Instant||
|current_timestamp()|2018-04-04 14:51|2018-04-04 14:51|2018-04-04 14:51|
|current_timestamp() - interval '2880' hour|2017-12-05 13:51|2018-12-05 
14:51|2017-12-05 13:51|
h2. Experiment 2

After reading your comments about ORC, I conducted some experiments to see how 
other file formats behave. The result is not what I was expecting. I did the 
experiment in 2.6.3.

The table below summarizes the outcome: 
[raw|[https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198]|https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198])]

 
||(server +05:45)||ORC||RCBinary||RCText||Text||
|Insert (client -07:00)|06:00:00|06:00:00|06:00:00|06:00:00|
|Read 1 (client -07:00)|18:45:00|06:00:00|18:45:00|18:45:00|
|Read 2 (client -04:00)|18:45:00|09:00:00|18:45:00|18:45:00|
h2. Summary

This is my take away
 * experiment 1: Hive timestamp type has Instant semantics. If it's internal 
representation is changed from java.sql.Timestamp to java.time.LocalDateTime, 
it will be a user-visible behavior change.
 * experiment 2: Do not use Hive in a zone different from the server's (given 
insert and read does not round trip in a single hive cli session). Hopefully, 
that's indeed how every one uses Hive. In that case, it does not matter whether 
experiment 2 indicates Instant or LocalDateTime.

 

> Hive should carry out timestamp computations in UTC
> ---------------------------------------------------
>
>                 Key: HIVE-12192
>                 URL: https://issues.apache.org/jira/browse/HIVE-12192
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Ryan Blue
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: timestamp
>         Attachments: HIVE-12192.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to