[ 
https://issues.apache.org/jira/browse/PARQUET-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059534#comment-14059534
 ] 

Ryan Blue commented on PARQUET-18:
----------------------------------

[Pull request #18|https://github.com/apache/incubator-parquet-mr/pull/18] is 
ready to be reviewed.

> Cannot read dictionary-encoded pages with all null values
> ---------------------------------------------------------
>
>                 Key: PARQUET-18
>                 URL: https://issues.apache.org/jira/browse/PARQUET-18
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>             Fix For: 1.6.0
>
>
> This is [issue #283|https://github.com/Parquet/parquet-mr/issues/283]. 
> Parquet-mr will try to read the bit-width byte in 
> {{DictionaryValuesReader#initPage}} even if the incoming offset is at the end 
> of the byte array because there are no values.
> Here's the stack trace:
> {code}
> Caused by: parquet.io.ParquetDecodingException: could not read page Page [id: 
> 1, bytes.size=7, valueCount=100, uncompressedSize=7] in col [id] INT32
>       at 
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532)
>       at 
> parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
>       at 
> parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:546)
>       at 
> parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
>       at 
> parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
>       at 
> parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
>       at 
> parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
>       at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60)
>       at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
>       at 
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:112)
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:174)
>       ... 29 more
> Caused by: java.io.EOFException
>       at 
> parquet.bytes.BytesUtils.readIntLittleEndianOnOneByte(BytesUtils.java:76)
>       at 
> parquet.column.values.dictionary.DictionaryValuesReader.initFromPage(DictionaryValuesReader.java:55)
>       at 
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:530)
>       ... 39 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to