[
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451891#comment-15451891
]
Vitalii Diravka commented on DRILL-4373:
----------------------------------------
[~rkins] As I see you have an error cause drill and hive use different data
types for timestamp logical type: hive uses int96 (the reason is nanoseconds
accuracy), but drill uses int64 (special data type for timestamps with
appropriate meta annotation due to [parquet
documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md],
used for microseconds or milliseconds accuracy). Therefore drill stores
timestamps correctly and hive must be able to read such parquet files:
https://issues.apache.org/jira/browse/HIVE-13435.
Another issue is that Drill can read hive timestamps from parquet files but
with using CONVERT_FROM function. By default drill converts INT96 to VARBINARY.
I'm going to implement in context of this jira ability for drill to interpret
hive timestamp in parquet files as timestamp implicitly by default, but with
controlling it by session/system option (for the case if a new datatype will be
stored as INT96 in the parquet file).
> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Hive, Storage - Parquet
> Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a
> hive table on top of the parquet file and use "timestamp" as the column type,
> drill fails to read the hive table through the hive storage plugin
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)