[
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394991#comment-15394991
]
Rui Li commented on HIVE-14305:
-------------------------------
Thanks [~xuefuz]. But I think SPARK-16078 doesn't solve the problem here. From
the PR's description:
bq. This PR will do the conversion based on human time (in local timezone), it
should return same result in whatever timezone. But because the mapping from
absolute timestamp to human time is not exactly one-to-one mapping, it will
still return wrong result in some timezone (also in the begging or ending of
DST).
bq. This PR is kind of the best effort fix. In long term, we should make the
TimestampType be timezone aware to fix this totally.
IIUC, Timestamp extends Date which is the milliseconds from epoch. And this is
essentially the same as Spark's approach with microseconds.
The problem is when local timezone (with DST) interprets this "time from
epoch", the result can be different from expected. I think this is why Spark
plans to make timestamp timezone-aware in long term. I think other possible
solutions are:
# Set JVM-wide timezone to UTC as [~rdblue] suggested. We need to figure out
how to handle timezone switch in this case.
# Use something less open to interpretation, e.g. String, to represent
timestamp. Need to be careful of the computations of a string timestamp, e.g.
compareTo, equals, hashCode, etc. And this can be less efficient because string
takes more bytes to store.
> To/From UTC timestamp may return incorrect result because of DST
> ----------------------------------------------------------------
>
> Key: HIVE-14305
> URL: https://issues.apache.org/jira/browse/HIVE-14305
> Project: Hive
> Issue Type: Sub-task
> Reporter: Rui Li
> Assignee: Rui Li
>
> If the machine's local timezone involves DST, the UDFs return incorrect
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03
> 02:01:00}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)