[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709893#comment-17709893 ]
James Turton commented on DRILL-8421: ------------------------------------- [~handmadecode] thanks for creating this. You'll see when you open a PR to the Apache Drill repo that the template will ask you for a link back to here. > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > ---------------------------------------------------------------------------------------------------- > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.21.0 > Reporter: Peter Franzen > Priority: Major > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)