Oliver Ricken created HIVE-21733:
------------------------------------
Summary: 'from_unixtime' does not recognise timezone
Key: HIVE-21733
URL: https://issues.apache.org/jira/browse/HIVE-21733
Project: Hive
Issue Type: Bug
Components: Hive, hpl/sql
Affects Versions: 3.1.0
Reporter: Oliver Ricken
Dear experts,
I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1
do not get the correct result in my local timezone but in UTC. As I have a Hive
1.2 system avaialble also, I can state that in that version, the issue does not
exist.
I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6.
For the detailled versions of the HDP stack and its components, see the beeline
results below.
I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date of
the post) set on all my system:
{code:java}
[user@host ~]$ date
Wed May 15 13:48:05 CEST 2019
[user@host ~]${code}
In a next step, I launch beeline and use the following timestamp as a
test-value:
1557921076
(corresponding to Wed May 15 13:51:16 2019 CEST and Wed May 15 11:51:16 2019
UTC)
On the HDP 2.6 system:
{code:java}
Connecting to
jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076);
_c0
2019-05-15 13:51:16
1 row selected (0.373 seconds){code}
As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected.
The timestamp is cast into a time in the local timezone. Very nice!
On the HDP 3.1 system:
{code:java}
[user@host ~]$ beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to
jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/[email protected];serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to
host.domain.com:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076);
DEBUG : Acquired the compile lock.
INFO : Compiling
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514):
select from_unixtime(1557921076)
DEBUG : Encoding valid txns info
13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
txnid:13519
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time
taken: 0.492 seconds
INFO : Executing
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514):
select from_unixtime(1557921076)
INFO : Completed executing
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time
taken: 0.011 seconds
INFO : OK
DEBUG : Shutting down query select from_unixtime(1557921076)
+----------------------+
| _c0 |
+----------------------+
| 2019-05-15 11:51:16 |
+----------------------+
1 row selected (0.672 seconds){code}
Here, things go the wrong way, as described in the above post. The local
timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. an
offset of 2h is returned.
I was able to track the problem down a bit further. Doing the detour of
converting via 'from_unix_timestamp' where providing a timezone parameter is
possible, gives the correct result:
{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select
from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin');
DEBUG : Acquired the compile lock.
INFO : Compiling
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8):
select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
DEBUG : Encoding valid txns info
13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
txnid:13520
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:timestamp, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time
taken: 1.024 seconds
INFO : Executing
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8):
select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
INFO : Completed executing
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time
taken: 0.01 seconds
INFO : OK
DEBUG : Shutting down query select
from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
+------------------------+
| _c0 |
+------------------------+
| 2019-05-15 13:51:16.0 |
+------------------------+
1 row selected (1.187 seconds){code}
Giving 'CEST', however, leads to the observed problem from above:
{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select
from_utc_timestamp(from_unixtime(1557921076), 'CEST');
DEBUG : Acquired the compile lock.
INFO : Compiling
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa):
select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
DEBUG : Encoding valid txns info
13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
txnid:13522
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:timestamp, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time
taken: 0.396 seconds
INFO : Executing
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa):
select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
INFO : Completed executing
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time
taken: 0.031 seconds
INFO : OK
DEBUG : Shutting down query select
from_utc_timestamp(from_unixtime(1557921076), 'CEST')
+------------------------+
| _c0 |
+------------------------+
| 2019-05-15 11:51:16.0 |
+------------------------+
1 row selected (0.547 seconds){code}
Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the year,
the result differs. I assume that 'CEST' is simply not recognised by Hive and
the fall-back to UTC is used.
A last test with the European winter time 'CET' gives the following result:
{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select
from_utc_timestamp(from_unixtime(1557921076), 'CET');
DEBUG : Acquired the compile lock.
INFO : Compiling
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821):
select from_utc_timestamp(from_unixtime(1557921076), 'CET')
DEBUG : Encoding valid txns info
13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523
txnid:13525
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:timestamp, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time
taken: 0.777 seconds
INFO : Executing
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821):
select from_utc_timestamp(from_unixtime(1557921076), 'CET')
INFO : Completed executing
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time
taken: 0.022 seconds
INFO : OK
DEBUG : Shutting down query select
from_utc_timestamp(from_unixtime(1557921076), 'CET')
+------------------------+
| _c0 |
+------------------------+
| 2019-05-15 13:51:16.0 |
+------------------------+
1 row selected (1.02 seconds){code}
Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result is
correct. I assume that Hive realises that 'CET'-member timezones are actually
in 'CEST' this time of the year and corrects accordingly.
As nice of a work-around using 'Europe/Berlin' may look at a first glance, I
suspect things to go wrong when the actual timezone is 'CET' in winter.
I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is the
timezone of the local system, resulting in a wrongly cast 'from_unixtime' in
the inner parentheses.
A summer-/winter-time independent work-around is the following:
{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select
from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin');
DEBUG : Acquired the compile lock.
INFO : Compiling
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed):
select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
DEBUG : Encoding valid txns info
13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
txnid:13531
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:timestamp, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time
taken: 0.453 seconds
INFO : Executing
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed):
select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
INFO : Completed executing
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time
taken: 0.009 seconds
INFO : OK
DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as
BIGINT)*1000, 'Europe/Berlin')
+------------------------+
| _c0 |
+------------------------+
| 2019-05-15 13:51:16.0 |
+------------------------+
1 row selected (0.586 seconds){code}
Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by
casting multiplying the timestamp by a factor of 1000 to get a genuine
unixtimestamp with milisecond-precision (in contrast to the truncated one with
second-precision used before).
For this trick to work, casting the truncated timestamp as BIGINT is necessary
as the '* 1000'-operation would exceed the size of a regular INT.
This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system
any more, here, the the 2h offset is applied in the wrong direction. It is
hence not applicable independently of the Hive-Version, unfortunately.
I would like to understand where the original issue arises from, the ignorance
of Hive towards the 'CEST' timezone (in my case).
As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug
is possible also for HDP 3.1 and Hive 3.1.
I am looking forward to some input from the expert community.
Cheers
Oliver
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)