[ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670906#comment-15670906
 ] 

Vitalii Diravka commented on DRILL-5034:
----------------------------------------

Note: It is no way to determine the value of the  
hive.parquet.timestamp.skip.conversion option from the parquet file.
The difference only in the relative shifting of hours of the timestamps values.

Drill, impala don't store this local timezone shift, but hive stores it by 
default.
And according to [parquet 
spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#timestamp_millis]
 the parquet files don't keep a local timezone. 

> Select timestamp from hive generated parquet always return in UTC
> -----------------------------------------------------------------
>
>                 Key: DRILL-5034
>                 URL: https://issues.apache.org/jira/browse/DRILL-5034
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>            Reporter: Krystal
>            Assignee: Vitalii Diravka
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +----------------------------------------------+
> |                    EXPR$0                    |
> +----------------------------------------------+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +----------------------------------------------+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-23 20:03:58.0  |
> | null                   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> +------------------------+
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +-------+---------------------------------------------------+
> |  ok   |                      summary                      |
> +-------+---------------------------------------------------+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +-------+---------------------------------------------------+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> +------------------------+
> |    create_timestamp    |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to