Krystal created DRILL-5381: ------------------------------ Summary: convert_from(col, 'TIMESTAMP_IMPALA') returns incorrect timestamp if there are multiple nulls Key: DRILL-5381 URL: https://issues.apache.org/jira/browse/DRILL-5381 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.9.0, 1.8.0, 1.10.0 Reporter: Krystal
In drill-1.10, setting `store.parquet.reader.int96_as_timestamp`=true returns expected data: select voter_id,create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 15; +-----------+------------------------+ | voter_id | create_timestamp | +-----------+------------------------+ | 1 | 2016-10-23 20:03:58.0 | | 2 | null | | 3 | 2016-09-09 12:01:18.0 | | 4 | 2017-03-06 20:35:55.0 | | 5 | 2017-01-20 22:32:43.0 | | 6 | 2016-10-22 05:46:12.0 | | 7 | 2016-09-19 10:21:29.0 | | 8 | null | | 9 | 2016-07-23 13:39:02.0 | | 10 | 2017-01-28 17:27:19.0 | | 11 | 2016-10-23 10:55:44.0 | | 12 | 2016-06-07 22:44:03.0 | | 13 | 2016-05-04 13:59:20.0 | | 14 | 2016-11-08 17:20:14.0 | | 15 | 2016-05-14 11:23:53.0 | +-----------+------------------------+ However, setting `store.parquet.reader.int96_as_timestamp`=false returns incorrect timestamp when it encounters the second "null" value. select voter_id,convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 15; +-----------+------------------------+ | voter_id | EXPR$1 | +-----------+------------------------+ | 1 | 2016-10-23 20:03:58.0 | | 2 | null | | 3 | 2016-09-09 12:01:18.0 | | 4 | 2017-03-06 20:35:55.0 | | 5 | 2017-01-20 22:32:43.0 | | 6 | 2016-10-22 05:46:12.0 | | 7 | 2016-09-19 10:21:29.0 | | 8 | 2016-07-23 13:39:02.0 | | 9 | 2016-10-23 10:55:44.0 | | 10 | 2016-06-07 22:44:03.0 | | 11 | 2016-05-04 13:59:20.0 | | 12 | 2016-11-08 17:20:14.0 | | 13 | 2016-05-14 11:23:53.0 | | 14 | 2016-06-20 16:18:51.0 | | 15 | 2016-09-09 10:02:28.0 | +-----------+------------------------+ Notice that the timestamp for voter_id=9 shifts to voter_id=8 which suppose to have value of "null". The rest of the timestamps after voter_id=7 are incorrect. This issue is also reproducible on both drill-1.8.0 and drill-1.9.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)