[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967675#comment-16967675
 ] 

Attila Jeges commented on IMPALA-3933:
--------------------------------------

[[email protected]]  The Java TZ database and the IANA TZ database (used by 
the OS) have different binary formats, making Impala use the Java TZ database 
is not a trivial task. 

We could package an IANA TZ database that is compatible with the current 
version of the Java TZ database and make it publicly available for Impala 
users. The problem with this approach is that timezone rules change frequently 
and Java's TZ db gets updated from time to time (when the admin runs system 
update) and then we will be out of sync again.
 
I haven't tested it yet but there's a tool to convert the IANA TZ database to 
Java's TZ database: https://github.com/akashche/tzdbgen . Perhaps we should 
point users to this (or a similar) tool if they want to keep the 2 databases in 
sync. 




> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-3933
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3933
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: impala 2.3
>            Reporter: Adriano Simone
>            Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 10:34:45 |
> | 6    | 1949-04-15 10:34:45 |
> | 7    | 1753-04-15 11:34:45 |
> | 8    | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 12:34:45 |
> | 6    | 1949-04-15 12:34:45 |
> | 7    | 1753-04-15 12:51:05 |
> | 8    | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to