Peter Franzen created DRILL-8421:
------------------------------------

             Summary: Parquet TIMESTAMP_MICROS columns in WHERE clauses are not 
converted to milliseconds before filtering
                 Key: DRILL-8421
                 URL: https://issues.apache.org/jira/browse/DRILL-8421
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.21.0
            Reporter: Peter Franzen


When using Drill with parquet files where the timestamp columns are in 
microseconds, Drill converts the microsecond values to milliseconds when 
displayed. However, when using a timestamp column in WHERE clauses it looks 
like the original microsecond value is used instead of the adjusted millisecond 
value when filtering records.

*To Reproduce*
Assume a parquet file in a directory "Test" with a column _timestampCol_ having 
the type {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.

Assume there are two records with the values 1673981999806149 and 
1674759597743552, respectively, in that column (i.e. the UTC dates 
2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
 # Execute the query
{{SELECT timestampCol FROM dfs.Test;}}
The result includes both records, as expected.

 # Execute the query
{{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
This produces an empty result although both records have a value less than the 
argument.

 # Execute
{{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
The result includes both records although neither have a value greater than the 
argument.

*Expected behavior*
The query in 2) above should produce a result with both records, and the query 
in 3) should produce an empty result.

*Additional context*
Even timestamps long into the future produce results with both records, e.g.:
{{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}

Manually converting the timestamp column to milliseconds produces the expected 
result:
{{SELECT timestampCol FROM dfs.Test WHERE 
TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to