[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711682#comment-17711682 ]
Peter Franzen commented on DRILL-8421: -------------------------------------- This problem also exists for time columns where the unit is microseconds. The symptoms are different, however: {code:java} SELECT timeCol FROM dfs.Test WHERE timeColĀ > TO_TIME('00:15:30.987','HH:mm:ss.SSS'); {code} gives the error {code:java} org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Long {code} and the logs show the stacktrace: {code:java} Caused by: java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Long (java.lang.Integer and java.lang.Long are in module java.base of loader 'bootstrap') at .......(:0) at org.apache.drill.exec.expr.ComparisonPredicate.lambda$createGTPredicate$1(ComparisonPredicate.java:185) at org.apache.drill.exec.expr.ComparisonPredicate.matches(ComparisonPredicate.java:122) at org.apache.drill.exec.store.parquet.FilterEvaluatorUtils.matches(FilterEvaluatorUtils.java:116) at org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata$GroupScanWithMetadataFilterer.filterAndGetMetadata(AbstractGroupScanWithMetadata.java:1092) at org.apache.drill.exec.store.parquet.AbstractParquetGroupScan$RowGroupScanFilterer.filterRowGroupMetadata(AbstractParquetGroupScan.java:610) at org.apache.drill.exec.store.parquet.AbstractParquetGroupScan$RowGroupScanFilterer.getFiltered(AbstractParquetGroupScan.java:565) at org.apache.drill.exec.store.parquet.AbstractParquetGroupScan.applyFilter(AbstractParquetGroupScan.java:265) at org.apache.drill.exec.store.parquet.FilePushDownFilter.doOnMatch(FilePushDownFilter.java:182) at org.apache.drill.exec.store.parquet.FilePushDownFilter$1.onMatch(FilePushDownFilter.java:82) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:337) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420) {code} The problem, as far as I can tell, is that the filter evaluator compares a time expression, which is an Integer, to a non-truncated microsecond time column, which is a Long. I'll add a fix for this in the upcoming PR (unless a separate issue would be preferable) > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > ---------------------------------------------------------------------------------------------------- > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.21.0 > Reporter: Peter Franzen > Priority: Major > Fix For: 1.21.1 > > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)