[ 
https://issues.apache.org/jira/browse/DRILL-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711761#comment-17711761
 ] 

Peter Franzen commented on DRILL-8423:
--------------------------------------

The problem is cause by the column values being read as 32-bit values, not 
64-bit values, in
{code:java}
org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders.DictionaryTimeMicrosReader::readField
 (long)
{code}
line 171:

 
{code:java}
 valueVec.getMutator().setSafe(valuesReadInCurrentPass + i, 
valReader.readInteger() / 1000);
{code}
and line 176:

 
{code:java}
int value = pageReader.pageData.getInt((int) readStartInBytes + i * 
dataTypeLengthInBytes);
{code}
The bug is also present in

 
{code:java}
org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders.NullableDictionaryTimeMicrosReader::readField(long)
 
{code}
The problem should be fixed by using the same read logic as for 
TIMESTAMP_MICROS in {{{}DictionaryTimeStampMicrosReader{}}}.

> Parquet TIME_MICROS columns with values > Integer.MAX_VALUE are not displayed 
> correctly
> ---------------------------------------------------------------------------------------
>
>                 Key: DRILL-8423
>                 URL: https://issues.apache.org/jira/browse/DRILL-8423
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.20.3
>            Reporter: Peter Franzen
>            Priority: Major
>
> Assume a parquet file in a directory "Test" with a column _timeCol_ having 
> the type {{{}org.apache.parquet.schema.OriginalType.TIME_MICROS{}}}.
> Assume there are two records with the values 2147483647 and 2147483648, 
> respectively, in that column (i.e. the times 00:35:47.483647 and 
> 00:35:47.483648).
> Executing the query
> {code:java}
> SELECT timeCol FROM dfs.Test;{code}
> produces the result
> {code:java}
> timeCol
> -------
> 00:35:47.483
> 23:24:12.517{code}
> i.e. the microsecond value of Integer.MAX_VALUE + 1 has wrapped around when 
> read from the parquet file (it is displayed as the same number of 
> milliseconds before midnight as the time represented by Integer.MAX_VALUE is 
> after midnight)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to