[
https://issues.apache.org/jira/browse/PARQUET-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joey Pereira updated PARQUET-2317:
----------------------------------
Description:
I have been running into a bug due to {{parquet-format}} and
{{parquet-format-structures}} both defining the
{{org.apache.parquet.format.Util}} class but doing so inconsistently.
Examples of this are several methods which include a {{BlockCipher}} parameter
that are defined from {{parquet-format-structures}} but not
{{{}parquet-format{}}}. While invoking code that happens to use these, such as
{{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code will
fail if the {{parquet-format}} happens to be loaded first on the classpath.
Here is an example stack trace for a Scala Spark application.
{code:java}
Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData
org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream,
org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
~[parquet_hadoop.jar:1.13.1]
... (my application code invoking the above)
{code}
Because of issues external to Parquet that I have yet to figure out (a complex
Spark and dependency setup), my classpaths are not deterministic and I am
unable to pin the {{parquet-format-structures}} ahead hence why I'm chiming in
about this.
Nonetheless, this is a fairly prickly edge to run into as both modules define
overlapping classes. {{Util}} is not the only class that appears to be defined
by both, just what I have been focusing on due to this bug.
It appears these methods were introduced in at least 1.12:
[https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7]
was:
I have been running into a bug due to {{parquet-format}} and
{{parquet-format-structures}} both defining the
{{org.apache.parquet.format.Util}} class but doing so inconsistently.
Examples of this are several methods which include a {{BlockCipher}} parameter
that are defined from {{parquet-format-structures}} but not
{{{}parquet-format{}}}. While invoking code that happens to use these, such as
{{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code will
fail if the {{parquet-format}} happens to be loaded first on the classpath.
Here is an example stack trace for a Scala Spark application.
{code:java}
Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData
org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream,
org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
~[parquet_hadoop.jar:1.13.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
~[parquet_hadoop.jar:1.13.1]
... (my application code invoking the above)
{code}
Because of issues external to Parquet that I have yet to figure out (a complex
Spark and dependency setup), my classpaths are not deterministic and I am
unable to pin the {{parquet-format-structures}} ahead hence why I'm chiming in
about this.
Nonetheless, this is a fairly prickly edge to run into as both modules define
overlapping classes. {{Util}} is not the only class that appears to be defined
by both, just what I have been focusing on due to this bug.
It appears these methods were introduced in at least 1.12:
[https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7]
> parquet-format and parquet-format-structures defines Util with inconsitent
> methods provided
> -------------------------------------------------------------------------------------------
>
> Key: PARQUET-2317
> URL: https://issues.apache.org/jira/browse/PARQUET-2317
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Affects Versions: 1.12.0, 1.13.0
> Reporter: Joey Pereira
> Priority: Major
>
> I have been running into a bug due to {{parquet-format}} and
> {{parquet-format-structures}} both defining the
> {{org.apache.parquet.format.Util}} class but doing so inconsistently.
> Examples of this are several methods which include a {{BlockCipher}}
> parameter that are defined from {{parquet-format-structures}} but not
> {{{}parquet-format{}}}. While invoking code that happens to use these, such
> as {{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code
> will fail if the {{parquet-format}} happens to be loaded first on the
> classpath.
> Here is an example stack trace for a Scala Spark application.
> {code:java}
> Caused by: java.lang.NoSuchMethodError:
> 'org.apache.parquet.format.FileMetaData
> org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream,
> org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
> ~[parquet_hadoop.jar:1.13.1]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
> ~[parquet_hadoop.jar:1.13.1]
> ... (my application code invoking the above)
> {code}
> Because of issues external to Parquet that I have yet to figure out (a
> complex Spark and dependency setup), my classpaths are not deterministic and
> I am unable to pin the {{parquet-format-structures}} ahead hence why I'm
> chiming in about this.
> Nonetheless, this is a fairly prickly edge to run into as both modules define
> overlapping classes. {{Util}} is not the only class that appears to be
> defined by both, just what I have been focusing on due to this bug.
> It appears these methods were introduced in at least 1.12:
> [https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)