bwjoh opened a new issue, #3013:
URL: https://github.com/apache/parquet-java/issues/3013

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Noticed when upgrading from 1.13.1 to 1.14.1
   
   ```
   java.lang.ClassCastException: class 
org.apache.parquet.column.values.dictionary.DictionaryValuesReader cannot be 
cast to class 
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader 
(org.apache.parquet.column.values.dictionary.DictionaryValuesReader and 
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader are in 
unnamed module of loader 'app')
        at 
org.apache.parquet.column.values.deltastrings.DeltaByteArrayReader.setPreviousReader(DeltaByteArrayReader.java:92)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.initDataReader(ColumnReaderBase.java:734)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.readPageV2(ColumnReaderBase.java:766)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.access$400(ColumnReaderBase.java:56)
        at 
org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:695)
        at 
org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:686)
        at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:232)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:686)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:660)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:802)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
        at 
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:427)
   ```
   
   This appears to be due to PARQUET-2431 - 
https://github.com/apache/parquet-java/pull/1274/files#diff-362b7d44b24283c1bb1f6ca3e124cb72706a33ed96d86b58bf3339f20aafb4e9R732
   
   Looking into how my code hit this and it seems to be that 
`CorruptDeltaByteArrays.requiresSequentialReads` was essentially doing the 
`dataColumn instanceof RequiresPreviousReader` check previously 
(`CorruptDeltaByteArrays.requiresSequentialReads` can only return true when 
`encoding == Encoding.DELTA_BYTE_ARRAY`, and 
`org.apache.parquet.column.values.RequiresPreviousReader` is only implemented 
by *DeltaByteArrayReader classes). 
   
   With no check on `previousReader instanceof RequiresPreviousReader` the 
ClassCastException is possible above. 
   
   This is more likely to happen when using 
`org.apache.parquet.io.ColumnIOFactory#ColumnIOFactory()` to read files without 
`createdBy`. In my case I was able to fix this by adding createdBy, knowing 
that all Parquet files I have were written after PARQUET-246, which prevents 
`CorruptDeltaByteArrays.requiresSequentialReads` from returning true
   
   ```
   val reader: ParquetFileReader = ...
   val fileMetadata = reader.getFooter.getFileMetaData
   val createdBy = fileMetadata.getCreatedBy
   val columnIO: MessageColumnIO = new ColumnIOFactory(createdBy)...
   ```
   
   ### Component(s)
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to