[ 
https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711682#comment-17711682
 ] 

Peter Franzen commented on DRILL-8421:
--------------------------------------

This problem also exists for time columns where the unit is microseconds. The 
symptoms are different, however:
{code:java}
SELECT timeCol FROM dfs.Test WHERE timeColĀ > 
TO_TIME('00:15:30.987','HH:mm:ss.SSS');
{code}
gives the error
{code:java}
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
ClassCastException: class java.lang.Integer cannot be cast to class 
java.lang.Long
{code}
and the logs show the stacktrace:
{code:java}
Caused by: java.lang.ClassCastException: class java.lang.Integer cannot be cast 
to class java.lang.Long (java.lang.Integer and java.lang.Long are in module 
java.base of loader 'bootstrap')
        at .......(:0)
        at 
org.apache.drill.exec.expr.ComparisonPredicate.lambda$createGTPredicate$1(ComparisonPredicate.java:185)
        at 
org.apache.drill.exec.expr.ComparisonPredicate.matches(ComparisonPredicate.java:122)
        at 
org.apache.drill.exec.store.parquet.FilterEvaluatorUtils.matches(FilterEvaluatorUtils.java:116)
        at 
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata$GroupScanWithMetadataFilterer.filterAndGetMetadata(AbstractGroupScanWithMetadata.java:1092)
        at 
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan$RowGroupScanFilterer.filterRowGroupMetadata(AbstractParquetGroupScan.java:610)
        at 
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan$RowGroupScanFilterer.getFiltered(AbstractParquetGroupScan.java:565)
        at 
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan.applyFilter(AbstractParquetGroupScan.java:265)
        at 
org.apache.drill.exec.store.parquet.FilePushDownFilter.doOnMatch(FilePushDownFilter.java:182)
        at 
org.apache.drill.exec.store.parquet.FilePushDownFilter$1.onMatch(FilePushDownFilter.java:82)
        at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:337)
        at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
        at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420)
{code}
The problem, as far as I can tell, is that the filter evaluator compares a time 
expression, which is an Integer, to a non-truncated microsecond time column, 
which is a Long.

I'll add a fix for this in the upcoming PR (unless a separate issue would be 
preferable)

> Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to 
> milliseconds before filtering
> ----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8421
>                 URL: https://issues.apache.org/jira/browse/DRILL-8421
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.21.0
>            Reporter: Peter Franzen
>            Priority: Major
>             Fix For: 1.21.1
>
>
> When using Drill with parquet files where the timestamp columns are in 
> microseconds, Drill converts the microsecond values to milliseconds when 
> displayed. However, when using a timestamp column in WHERE clauses it looks 
> like the original microsecond value is used instead of the adjusted 
> millisecond value when filtering records.
> *To Reproduce*
> Assume a parquet file in a directory "Test" with a column _timestampCol_ 
> having the type 
> {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.
> Assume there are two records with the values 1673981999806149 and 
> 1674759597743552, respectively, in that column (i.e. the UTC dates 
> 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test;}}
> The result includes both records, as expected.
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
> TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
> This produces an empty result although both records have a value less than 
> the argument.
>  # Execute
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
> The result includes both records although neither have a value greater than 
> the argument.
> *Expected behavior*
> The query in 2) above should produce a result with both records, and the 
> query in 3) should produce an empty result.
> *Additional context*
> Even timestamps long into the future produce results with both records, e.g.:
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
> Manually converting the timestamp column to milliseconds produces the 
> expected result:
> {{SELECT timestampCol FROM dfs.Test WHERE 
> TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
> 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')}}
> produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to