[
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967598#comment-16967598
]
Manish Maheshwari edited comment on IMPALA-3933 at 11/5/19 3:10 PM:
--------------------------------------------------------------------
Thanks [~attilaj], replies as below
1 - This is quite well known.
2 - Ok. So there isn't a solution possible this at all till the customer
upgrades to CDP DC when the dates will match.
3 - Ok, understood. Is there a plan for Impala to also use the Java TZ
database? I am guessing this is difficult to implement in the C++ BE. The
reason for this ask is to avoid debugging TZ issues due to Java and OS
mismatch. Alternatively can be package the Java TZ database and make it
publicly available / part of the Impala binary to make it easy to use
was (Author: [email protected]):
[~attilaj]
1 - This is quite well known.
2 - Ok. So there isn't a solution possible this at all till the customer
upgrades to CDP DC when the dates will match.
3 - Ok, understood. Is there a plan for Impala to also use the Java TZ
database? I am guessing this is difficult to implement in the C++ BE. The
reason for this ask is to avoid debugging TZ issues due to Java and OS
mismatch. Alternatively can be package the Java TZ database and make it
publicly available / part of the Impala binary to make it easy to use
> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
> Key: IMPALA-3933
> URL: https://issues.apache.org/jira/browse/IMPALA-3933
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: impala 2.3
> Reporter: Adriano Simone
> Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause
> data skew (improper converting) upon the reading for dates earlier than 1900
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't
> checked the exact starting date of DST computation), and GMT+2 when summer
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts |
> +------+---------------------+
> | 1 | 2016-04-15 12:34:45 |
> | 2 | 1949-04-15 12:34:45 |
> | 3 | 1753-04-15 12:34:45 |
> | 4 | 1752-04-15 12:34:45 |
> | 5 | 2016-04-15 10:34:45 |
> | 6 | 1949-04-15 10:34:45 |
> | 7 | 1753-04-15 11:34:45 |
> | 8 | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts |
> +------+---------------------+
> | 1 | 2016-04-15 12:34:45 |
> | 2 | 1949-04-15 12:34:45 |
> | 3 | 1753-04-15 12:34:45 |
> | 4 | 1752-04-15 12:34:45 |
> | 5 | 2016-04-15 12:34:45 |
> | 6 | 1949-04-15 12:34:45 |
> | 7 | 1753-04-15 12:51:05 |
> | 8 | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]