[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Rahn updated IMPALA-3933:
------------------------------
    Labels:   (was: timestamp)

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-3933
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3933
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: impala 2.3
>            Reporter: Adriano Simone
>            Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 10:34:45 |
> | 6    | 1949-04-15 10:34:45 |
> | 7    | 1753-04-15 11:34:45 |
> | 8    | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 12:34:45 |
> | 6    | 1949-04-15 12:34:45 |
> | 7    | 1753-04-15 12:51:05 |
> | 8    | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to