[
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586088#comment-15586088
]
ASF GitHub Bot commented on DRILL-4373:
---------------------------------------
Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/600#discussion_r83908798
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
---
@@ -739,30 +739,54 @@ public void runTestAndValidate(String selection,
String validationSelection, Str
}
/*
- Test the reading of an int96 field. Impala encodes timestamps as int96
fields
+ Impala encodes timestamp values as int96 fields. Test the reading of
an int96 field with two converters:
+ the first one converts parquet INT96 into drill VARBINARY and the
second one (works while
+ store.parquet.reader.int96_as_timestamp option is enabled) converts
parquet INT96 into drill TIMESTAMP.
*/
@Test
public void testImpalaParquetInt96() throws Exception {
compareParquetReadersColumnar("field_impala_ts",
"cp.`parquet/int96_impala_1.parquet`");
+ try {
+ test("alter session set %s = true",
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+ compareParquetReadersColumnar("field_impala_ts",
"cp.`parquet/int96_impala_1.parquet`");
--- End diff --
Github seems to have swallowed the previous comments so including
@vdiravka's questions here:
> 1) Is it better to compare result with baseline columns and values from
the file or it is ok to compare with sqlBaselineQuery and disabled new
PARQUET_READER_INT96_AS_TIMESTAMP option?
> In the process of investigating this test I found that the primitive data
type of the column in the file int96_dict_change.parquet is BINARY, not INT96.
> 2) I am a little bit confused with this. Do we need convert this BINARY
to TIMESTAMP as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument
works properly for this field. I will investigate a little more about does
impala and hive can store timestamps into parquet BINARY.
For 1) I think it is better to compare values from the file as opposed to
running with the the PARQUET_READER_INT96_AS_TIMESTAMP disabled.
For 2) Can you correct the int96 data in the file? AFAIK, the data should
be int96 for the test.
> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Hive, Storage - Parquet
> Affects Versions: 1.8.0
> Reporter: Rahul Challapalli
> Assignee: Karthikeyan Manivannan
> Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a
> hive table on top of the parquet file and use "timestamp" as the column type,
> drill fails to read the hive table through the hive storage plugin
> Implementation:
> Added int96 to timestamp converter for both parquet readers and controling it
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to
> the query fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)