ASF GitHub Bot commented on DRILL-4373:

Github user parthchandra commented on a diff in the pull request:

    --- Diff: 
    @@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
    -  Test the reading of an int96 field. Impala encodes timestamps as int96 
    +    Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
    +    the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
    +    store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
       public void testImpalaParquetInt96() throws Exception {
    +    try {
    +      test("alter session set %s = true", 
    +      compareParquetReadersColumnar("field_impala_ts", 
    --- End diff --
    Github seems to have swallowed the previous comments so including 
@vdiravka's questions here:
    >  1) Is it better to compare result with baseline columns and values from 
the file or it is ok to compare with sqlBaselineQuery and disabled new 
    > In the process of investigating this test I found that the primitive data 
type of the column in the file int96_dict_change.parquet is BINARY, not INT96.
    > 2) I am a little bit confused with this. Do we need convert this BINARY 
to TIMESTAMP as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument 
works properly for this field. I will investigate a little more about does 
impala and hive can store timestamps into parquet BINARY.
    For 1) I think it is better to compare values from the file as opposed to 
running with the the PARQUET_READER_INT96_AS_TIMESTAMP disabled.
    For 2) Can you correct the int96 data in the file? AFAIK, the data should 
be int96 for the test.

> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive, Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Rahul Challapalli
>            Assignee: Karthikeyan Manivannan
>              Labels: doc-impacting
>             Fix For: 1.9.0
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.

This message was sent by Atlassian JIRA

Reply via email to