[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967598#comment-16967598
 ] 

Manish Maheshwari edited comment on IMPALA-3933 at 11/5/19 3:10 PM:
--------------------------------------------------------------------

Thanks [~attilaj], replies as below

1 - This is quite well known. 

2 - Ok. So there isn't a solution possible this at all till the customer 
upgrades to CDP DC when the dates will match.

3 - Ok, understood. Is there a plan for Impala to also use the Java TZ 
database? I am guessing this is difficult to implement in the C++ BE. The 
reason for this ask is to avoid debugging TZ issues due to Java and OS 
mismatch. Alternatively can be package the Java TZ database and make it 
publicly available / part of the Impala binary to make it easy to use


was (Author: [email protected]):
[~attilaj]

1 - This is quite well known. 

2 - Ok. So there isn't a solution possible this at all till the customer 
upgrades to CDP DC when the dates will match.

3 - Ok, understood. Is there a plan for Impala to also use the Java TZ 
database? I am guessing this is difficult to implement in the C++ BE. The 
reason for this ask is to avoid debugging TZ issues due to Java and OS 
mismatch. Alternatively can be package the Java TZ database and make it 
publicly available / part of the Impala binary to make it easy to use

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-3933
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3933
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: impala 2.3
>            Reporter: Adriano Simone
>            Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 10:34:45 |
> | 6    | 1949-04-15 10:34:45 |
> | 7    | 1753-04-15 11:34:45 |
> | 8    | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 12:34:45 |
> | 6    | 1949-04-15 12:34:45 |
> | 7    | 1753-04-15 12:51:05 |
> | 8    | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to