Oliver Ricken created HIVE-21733:
------------------------------------

             Summary: 'from_unixtime' does not recognise timezone
                 Key: HIVE-21733
                 URL: https://issues.apache.org/jira/browse/HIVE-21733
             Project: Hive
          Issue Type: Bug
          Components: Hive, hpl/sql
    Affects Versions: 3.1.0
            Reporter: Oliver Ricken


Dear experts,

I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1 
do not get the correct result in my local timezone but in UTC. As I have a Hive 
1.2 system avaialble also, I can state that in that version, the issue does not 
exist.


I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6.
For the detailled versions of the HDP stack and its components, see the beeline 
results below.

I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date of 
the post) set on all my system:


{code:java}
[user@host ~]$ date
Wed May 15 13:48:05 CEST 2019
[user@host ~]${code}

In a next step, I launch beeline and use the following timestamp as a 
test-value:
1557921076

(corresponding to Wed May 15 13:51:16 2019 CEST and  Wed May 15 11:51:16 2019 
UTC)

On the HDP 2.6 system:
{code:java}
Connecting to 
jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076);
_c0
2019-05-15 13:51:16
1 row selected (0.373 seconds){code}

As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected. 
The timestamp is cast into a time in the local timezone. Very nice!

On the HDP 3.1 system:


{code:java}
[user@host ~]$ beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to 
jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/_h...@domain.com;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to 
host.domain.com:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076);
DEBUG : Acquired the compile lock.
INFO  : Compiling 
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): 
select from_unixtime(1557921076)
DEBUG : Encoding valid txns info 
13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
 txnid:13519
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time 
taken: 0.492 seconds
INFO  : Executing 
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): 
select from_unixtime(1557921076)
INFO  : Completed executing 
command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time 
taken: 0.011 seconds
INFO  : OK
DEBUG : Shutting down query select from_unixtime(1557921076)
+----------------------+
|         _c0          |
+----------------------+
| 2019-05-15 11:51:16  |
+----------------------+
1 row selected (0.672 seconds){code}

Here, things go the wrong way, as described in the above post. The local 
timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. an 
offset of 2h is returned.
I was able to track the problem down a bit further. Doing the detour of 
converting via 'from_unix_timestamp' where providing a timezone parameter is 
possible, gives the correct result:


{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select 
from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin');
DEBUG : Acquired the compile lock.
INFO  : Compiling 
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): 
select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
DEBUG : Encoding valid txns info 
13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
 txnid:13520
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:timestamp, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time 
taken: 1.024 seconds
INFO  : Executing 
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): 
select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
INFO  : Completed executing 
command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time 
taken: 0.01 seconds
INFO  : OK
DEBUG : Shutting down query select 
from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
+------------------------+
|          _c0           |
+------------------------+
| 2019-05-15 13:51:16.0  |
+------------------------+
1 row selected (1.187 seconds){code}

Giving 'CEST', however, leads to the observed problem from above:


{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select 
from_utc_timestamp(from_unixtime(1557921076), 'CEST');
DEBUG : Acquired the compile lock.
INFO  : Compiling 
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): 
select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
DEBUG : Encoding valid txns info 
13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
 txnid:13522
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:timestamp, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time 
taken: 0.396 seconds
INFO  : Executing 
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): 
select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
INFO  : Completed executing 
command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time 
taken: 0.031 seconds
INFO  : OK
DEBUG : Shutting down query select 
from_utc_timestamp(from_unixtime(1557921076), 'CEST')
+------------------------+
|          _c0           |
+------------------------+
| 2019-05-15 11:51:16.0  |
+------------------------+
1 row selected (0.547 seconds){code}

Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the year, 
the result differs. I assume that 'CEST' is simply not recognised by Hive and 
the fall-back to UTC is used.
A last test with the European winter time 'CET' gives the following result:


{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select 
from_utc_timestamp(from_unixtime(1557921076), 'CET');
DEBUG : Acquired the compile lock.
INFO  : Compiling 
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): 
select from_utc_timestamp(from_unixtime(1557921076), 'CET')
DEBUG : Encoding valid txns info 
13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523
 txnid:13525
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:timestamp, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time 
taken: 0.777 seconds
INFO  : Executing 
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): 
select from_utc_timestamp(from_unixtime(1557921076), 'CET')
INFO  : Completed executing 
command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time 
taken: 0.022 seconds
INFO  : OK
DEBUG : Shutting down query select 
from_utc_timestamp(from_unixtime(1557921076), 'CET')
+------------------------+
|          _c0           |
+------------------------+
| 2019-05-15 13:51:16.0  |
+------------------------+
1 row selected (1.02 seconds){code}

Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result is 
correct. I assume that Hive realises that 'CET'-member timezones are actually 
in 'CEST' this time of the year and corrects accordingly.
As nice of a work-around using 'Europe/Berlin' may look at a first glance, I 
suspect things to go wrong when the actual timezone is 'CET' in winter.
I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is the 
timezone of the local system, resulting in a wrongly cast 'from_unixtime' in 
the inner parentheses.
A summer-/winter-time independent work-around is the following:


{code:java}
0: jdbc:hive2://host.domain.com:2181,host.n> select 
from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin');
DEBUG : Acquired the compile lock.
INFO  : Compiling 
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): 
select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
DEBUG : Encoding valid txns info 
13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
 txnid:13531
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:timestamp, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time 
taken: 0.453 seconds
INFO  : Executing 
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): 
select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
INFO  : Completed executing 
command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time 
taken: 0.009 seconds
INFO  : OK
DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as 
BIGINT)*1000, 'Europe/Berlin')
+------------------------+
|          _c0           |
+------------------------+
| 2019-05-15 13:51:16.0  |
+------------------------+
1 row selected (0.586 seconds){code}

Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by 
casting multiplying the timestamp by a factor of 1000 to get a genuine 
unixtimestamp with milisecond-precision (in contrast to the truncated one with 
second-precision used before).
For this trick to work, casting the truncated timestamp as BIGINT is necessary 
as the '* 1000'-operation would exceed the size of a regular INT.

This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system 
any more, here, the the 2h offset is applied in the wrong direction. It is 
hence not applicable independently of the Hive-Version, unfortunately.

I would like to understand where the original issue arises from, the ignorance 
of Hive towards the 'CEST' timezone (in my case).
As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug 
is possible also for HDP 3.1 and Hive 3.1.

I am looking forward to some input from the expert community.

Cheers

Oliver



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to