[
https://issues.apache.org/jira/browse/DRILL-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Challapalli updated DRILL-4345:
-------------------------------------
Attachment: hive1_fewtypes_null.parquet
> Hive Native Reader reporting wrong results for timestamp column in hive
> generated parquet file
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-4345
> URL: https://issues.apache.org/jira/browse/DRILL-4345
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Hive, Storage - Parquet
> Reporter: Rahul Challapalli
> Priority: Critical
> Attachments: hive1_fewtypes_null.parquet
>
>
> git.commit.id.abbrev=1b96174
> Below you can see different results returned from hive plugin and native
> reader for the same table.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> use hive;
> +-------+-----------------------------------+
> | ok | summary |
> +-------+-----------------------------------+
> | true | Default schema changed to [hive] |
> +-------+-----------------------------------+
> 1 row selected (0.415 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select int_col, timestamp_col from
> hive1_fewtypes_null_parquet;
> +----------+------------------------+
> | int_col | timestamp_col |
> +----------+------------------------+
> | 1 | null |
> | null | 1997-01-02 00:00:00.0 |
> | 3 | null |
> | 4 | null |
> | 5 | 1997-02-10 17:32:00.0 |
> | 6 | 1997-02-11 17:32:01.0 |
> | 7 | 1997-02-12 17:32:01.0 |
> | 8 | 1997-02-13 17:32:01.0 |
> | 9 | null |
> | 10 | 1997-02-15 17:32:01.0 |
> | null | 1997-02-16 17:32:01.0 |
> | 12 | 1897-02-18 17:32:01.0 |
> | 13 | 2002-02-14 17:32:01.0 |
> | 14 | 1991-02-10 17:32:01.0 |
> | 15 | 1900-02-16 17:32:01.0 |
> | 16 | null |
> | null | 1897-02-16 17:32:01.0 |
> | 18 | 1997-02-16 17:32:01.0 |
> | null | null |
> | 20 | 1996-02-28 17:32:01.0 |
> | null | null |
> +----------+------------------------+
> 21 rows selected (0.368 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set
> `store.hive.optimize_scan_with_native_readers` = true;
> +-------+--------------------------------------------------------+
> | ok | summary |
> +-------+--------------------------------------------------------+
> | true | store.hive.optimize_scan_with_native_readers updated. |
> +-------+--------------------------------------------------------+
> 1 row selected (0.213 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select int_col, timestamp_col from
> hive1_fewtypes_null_parquet;
> +----------+------------------------+
> | int_col | timestamp_col |
> +----------+------------------------+
> | 1 | null |
> | null | 1997-01-02 00:00:00.0 |
> | 3 | 1997-02-10 17:32:00.0 |
> | 4 | null |
> | 5 | 1997-02-11 17:32:01.0 |
> | 6 | 1997-02-12 17:32:01.0 |
> | 7 | 1997-02-13 17:32:01.0 |
> | 8 | 1997-02-15 17:32:01.0 |
> | 9 | 1997-02-16 17:32:01.0 |
> | 10 | 1900-02-16 17:32:01.0 |
> | null | 1897-02-16 17:32:01.0 |
> | 12 | 1997-02-16 17:32:01.0 |
> | 13 | 1996-02-28 17:32:01.0 |
> | 14 | 1997-01-02 00:00:00.0 |
> | 15 | 1997-01-02 00:00:00.0 |
> | 16 | 1997-01-02 00:00:00.0 |
> | null | 1997-01-02 00:00:00.0 |
> | 18 | 1997-01-02 00:00:00.0 |
> | null | 1997-01-02 00:00:00.0 |
> | 20 | 1997-01-02 00:00:00.0 |
> | null | 1997-01-02 00:00:00.0 |
> +----------+------------------------+
> 21 rows selected (0.352 seconds)
> {code}
> DDL for hive table :
> {code}
> create external table hive1_fewtypes_null_parquet (
> int_col int,
> bigint_col bigint,
> date_col string,
> time_col string,
> timestamp_col timestamp,
> interval_col string,
> varchar_col string,
> float_col float,
> double_col double,
> bool_col boolean
> )
> stored as parquet
> location '/drill/testdata/hive_storage/hive1_fewtypes_null';
> {code}
> Attached the underlying parquet file
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)