[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451891#comment-15451891
 ] 

Vitalii Diravka commented on DRILL-4373:
----------------------------------------

[~rkins] As I see you have an error cause drill and hive use different data 
types for timestamp logical type: hive uses int96 (the reason is nanoseconds 
accuracy), but drill uses int64 (special data type for timestamps with 
appropriate meta annotation due to [parquet 
documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md],
 used for microseconds or milliseconds accuracy). Therefore drill stores 
timestamps correctly and hive must be able to read such parquet files: 
https://issues.apache.org/jira/browse/HIVE-13435.

Another issue is that Drill can read hive timestamps from parquet files but 
with using CONVERT_FROM function. By default drill converts INT96 to VARBINARY.
I'm going to implement in context of this jira ability for drill to interpret 
hive timestamp in parquet files as timestamp implicitly by default, but with 
controlling it by session/system option (for the case if a new datatype will be 
stored as INT96 in the parquet file).


> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive, Storage - Parquet
>            Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to