[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963707#comment-16963707
 ] 

Manish Maheshwari commented on IMPALA-3933:
-------------------------------------------

This is the current behaviour on impala-2.11

Issue 1 - 
{code:java}
We will get the below error if a date < 1400/01/01 is inserted. WARNINGS: 
Parquet file 'hdfs://ns1/user/hive/warehouse/abc/000000_0_copy_5' column 'ts' 
contains an out of range timestamp. The valid date range is 
1400-01-01..9999-12-31. {code}
Issue 2 and workaround using Hive UDF's in Impala
{code:java}

1) Set these in Impala - 
-convert_legacy_hive_parquet_utc_timestamps=true
-use_local_tz_for_unix_timestamp_conversions=true

2) In Hive - 

beeline>create table abc(ts timestamp) stored as parquet;
beeline>insert into abc values ('1400-12-12 00:00:00');
beeline>insert into abc values ('1400-9-12 00:00:00');
beeline>insert into abc values ('1500-9-12 00:00:00');
beeline>insert into abc values ('1500-10-12 00:00:00');

beeline>select * from abc;
------------------------+abc.ts------------------------+
1400-12-12 00:00:00.0
1400-09-12 00:00:00.0
1500-09-12 00:00:00.0
1500-10-12 00:00:00.0
------------------------+

impala-shell>invalidate metadata;

impala-shell>select * from abc; 

## This is not the right output
---------------------ts---------------------
1400-09-21 08:00:00
1500-09-22 08:00:00
1400-12-21 08:00:00
1500-10-22 08:00:00
---------------------
Fetched 4 row(s) in 2.96s

## Now using the Hive UDF in Impala, copy hive-exec-1.1.0-cdh5.x.x.jar as 
/tmp/hive-udf.jar

create function hive_unixtime location '/tmp/hive-udf.jar' 
symbol='org.apache.hadoop.hive.ql.udf.UDFFromUnixTime' 

impala-shell> select hive_unixtime(unix_timestamp(ts),'yyyy/MM/dd HH:mm') as 
yyyy_mm_dd_hh_mm from abc;

------------------yyyy_mm_dd_hh_mm------------------
1400/09/12 00:00
1500/09/12 00:00
1400/12/12 00:00
1500/10/12 00:00
------------------
Fetched 4 row(s) in 0.43s
 {code}
 

 

> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-3933
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3933
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: impala 2.3
>            Reporter: Adriano Simone
>            Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 10:34:45 |
> | 6    | 1949-04-15 10:34:45 |
> | 7    | 1753-04-15 11:34:45 |
> | 8    | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 12:34:45 |
> | 6    | 1949-04-15 12:34:45 |
> | 7    | 1753-04-15 12:51:05 |
> | 8    | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to