bwjoh opened a new issue, #3013:
URL: https://github.com/apache/parquet-java/issues/3013
### Describe the bug, including details regarding any error messages,
version, and platform.
Noticed when upgrading from 1.13.1 to 1.14.1
```
java.lang.ClassCastException: class
org.apache.parquet.column.values.dictionary.DictionaryValuesReader cannot be
cast to class
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader
(org.apache.parquet.column.values.dictionary.DictionaryValuesReader and
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader are in
unnamed module of loader 'app')
at
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader.setPreviousReader(DeltaByteArrayReader.java:92)
at
org.apache.parquet.column.impl.ColumnReaderBase.initDataReader(ColumnReaderBase.java:734)
at
org.apache.parquet.column.impl.ColumnReaderBase.readPageV2(ColumnReaderBase.java:766)
at
org.apache.parquet.column.impl.ColumnReaderBase.access$400(ColumnReaderBase.java:56)
at
org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:695)
at
org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:686)
at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:232)
at
org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:686)
at
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:660)
at
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:802)
at
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
at
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:427)
```
This appears to be due to PARQUET-2431 -
https://github.com/apache/parquet-java/pull/1274/files#diff-362b7d44b24283c1bb1f6ca3e124cb72706a33ed96d86b58bf3339f20aafb4e9R732
Looking into how my code hit this and it seems to be that
`CorruptDeltaByteArrays.requiresSequentialReads` was essentially doing the
`dataColumn instanceof RequiresPreviousReader` check previously
(`CorruptDeltaByteArrays.requiresSequentialReads` can only return true when
`encoding == Encoding.DELTA_BYTE_ARRAY`, and
`org.apache.parquet.column.values.RequiresPreviousReader` is only implemented
by *DeltaByteArrayReader classes).
With no check on `previousReader instanceof RequiresPreviousReader` the
ClassCastException is possible above.
This is more likely to happen when using
`org.apache.parquet.io.ColumnIOFactory#ColumnIOFactory()` to read files without
`createdBy`. In my case I was able to fix this by adding createdBy, knowing
that all Parquet files I have were written after PARQUET-246, which prevents
`CorruptDeltaByteArrays.requiresSequentialReads` from returning true
```
val reader: ParquetFileReader = ...
val fileMetadata = reader.getFooter.getFileMetaData
val createdBy = fileMetadata.getCreatedBy
val columnIO: MessageColumnIO = new ColumnIOFactory(createdBy)...
```
### Component(s)
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]