Joey Pereira created PARQUET-2317:
-------------------------------------

             Summary: parquet-format and parquet-format-structures defines Util 
with inconsitent methods provided
                 Key: PARQUET-2317
                 URL: https://issues.apache.org/jira/browse/PARQUET-2317
             Project: Parquet
          Issue Type: Bug
          Components: parquet-format
    Affects Versions: 1.13.0, 1.12.0
            Reporter: Joey Pereira


I have been running into a bug due to \{{parquet-format}} and 
\{{parquet-format-structures}} both defining the 
\{{org.apache.parquet.format.Util}} class but doing so inconsistently.

Examples of this are several methods which include a \{{BlockCipher}} parameter 
that are defined from \{{parquet-format-structures}} but not 
\{{parquet-format}}. While invoking code that happens to use these, such as 
\{{org.apache.parquet.hadoop.ParquetFileReader.readFooter}}, the code will fail 
if the \{{parquet-format}} happens to be loaded first on the classpath.

Here is an example stack trace for a Scala Spark application.

{code}
Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData 
org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, 
org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
at 
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
 ~[parquet_hadoop.jar:1.13.1]
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
 ~[parquet_hadoop.jar:1.13.1]
... (my application code invoking the above)
{code}

Because of issues external to Parquet that I have yet to figure out (a complex 
Spark and dependency setup), the classpaths are not deterministic and I am 
unable to pin the \{{parquet-format-structures}} ahead. Nonetheless, this is a 
fairly prickly edge to run into as both modules define overlapping classes. 
\{{Util}} is not the only class that appears to be defined by both, just what I 
have been focusing on due to this bug.
It appears these methods were introduced in at least 1.12: 
https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to