[
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Rahn updated IMPALA-3933:
------------------------------
Labels: timestamp (was: )
> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: impala 2.3
> Reporter: Adriano Simone
> Priority: Minor
> Labels: timestamp
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause
> data skew (improper converting) upon the reading for dates earlier than 1900
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't
> checked the exact starting date of DST computation), and GMT+2 when summer
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts |
> +------+---------------------+
> | 1 | 2016-04-15 12:34:45 |
> | 2 | 1949-04-15 12:34:45 |
> | 3 | 1753-04-15 12:34:45 |
> | 4 | 1752-04-15 12:34:45 |
> | 5 | 2016-04-15 10:34:45 |
> | 6 | 1949-04-15 10:34:45 |
> | 7 | 1753-04-15 11:34:45 |
> | 8 | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts |
> +------+---------------------+
> | 1 | 2016-04-15 12:34:45 |
> | 2 | 1949-04-15 12:34:45 |
> | 3 | 1753-04-15 12:34:45 |
> | 4 | 1752-04-15 12:34:45 |
> | 5 | 2016-04-15 12:34:45 |
> | 6 | 1949-04-15 12:34:45 |
> | 7 | 1753-04-15 12:51:05 |
> | 8 | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]