[ 
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394991#comment-15394991
 ] 

Rui Li commented on HIVE-14305:
-------------------------------

Thanks [~xuefuz]. But I think SPARK-16078 doesn't solve the problem here. From 
the PR's description:
bq. This PR will do the conversion based on human time (in local timezone), it 
should return same result in whatever timezone. But because the mapping from 
absolute timestamp to human time is not exactly one-to-one mapping, it will 
still return wrong result in some timezone (also in the begging or ending of 
DST).
bq. This PR is kind of the best effort fix. In long term, we should make the 
TimestampType be timezone aware to fix this totally.

IIUC, Timestamp extends Date which is the milliseconds from epoch. And this is 
essentially the same as Spark's approach with microseconds.
The problem is when local timezone (with DST) interprets this "time from 
epoch", the result can be different from expected. I think this is why Spark 
plans to make timestamp timezone-aware in long term. I think other possible 
solutions are:
# Set JVM-wide timezone to UTC as [~rdblue] suggested. We need to figure out 
how to handle timezone switch in this case.
# Use something less open to interpretation, e.g. String, to represent 
timestamp. Need to be careful of the computations of a string timestamp, e.g. 
compareTo, equals, hashCode, etc. And this can be less efficient because string 
takes more bytes to store.

> To/From UTC timestamp may return incorrect result because of DST
> ----------------------------------------------------------------
>
>                 Key: HIVE-14305
>                 URL: https://issues.apache.org/jira/browse/HIVE-14305
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> If the machine's local timezone involves DST, the UDFs return incorrect 
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to