[
https://issues.apache.org/jira/browse/PARQUET-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059534#comment-14059534
]
Ryan Blue commented on PARQUET-18:
----------------------------------
[Pull request #18|https://github.com/apache/incubator-parquet-mr/pull/18] is
ready to be reviewed.
> Cannot read dictionary-encoded pages with all null values
> ---------------------------------------------------------
>
> Key: PARQUET-18
> URL: https://issues.apache.org/jira/browse/PARQUET-18
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Ryan Blue
> Assignee: Ryan Blue
> Fix For: 1.6.0
>
>
> This is [issue #283|https://github.com/Parquet/parquet-mr/issues/283].
> Parquet-mr will try to read the bit-width byte in
> {{DictionaryValuesReader#initPage}} even if the incoming offset is at the end
> of the byte array because there are no values.
> Here's the stack trace:
> {code}
> Caused by: parquet.io.ParquetDecodingException: could not read page Page [id:
> 1, bytes.size=7, valueCount=100, uncompressedSize=7] in col [id] INT32
> at
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532)
> at
> parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
> at
> parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:546)
> at
> parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
> at
> parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
> at
> parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
> at
> parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
> at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60)
> at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
> at
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:112)
> at
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:174)
> ... 29 more
> Caused by: java.io.EOFException
> at
> parquet.bytes.BytesUtils.readIntLittleEndianOnOneByte(BytesUtils.java:76)
> at
> parquet.column.values.dictionary.DictionaryValuesReader.initFromPage(DictionaryValuesReader.java:55)
> at
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:530)
> ... 39 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)