[ https://issues.apache.org/jira/browse/PARQUET-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Pereira updated PARQUET-2317: ---------------------------------- Description: I have been running into a bug due to {{parquet-format}} and {{parquet-format-structures}} both defining the {{org.apache.parquet.format.Util}} class but doing so inconsistently. Examples of this are several methods which include a {{BlockCipher}} parameter that are defined from {{parquet-format-structures}} but not {{{}parquet-format{}}}. While invoking code that happens to use these, such as {{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code will fail if the {{parquet-format}} happens to be loaded first on the classpath. Here is an example stack trace for a Scala Spark application. {code:java} Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, org.apache.parquet.format.BlockCipher$Decryptor, byte[])' at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478) ~[parquet_hadoop.jar:1.13.1] ... (my application code invoking the above) {code} Because of issues external to Parquet that I have yet to figure out (a complex Spark and dependency setup), my classpaths are not deterministically ordered and I am unable to pin the {{parquet-format-structures}} ahead hence why I'm chiming in about this. Even if that weren't the case, this is a fairly prickly edge to run into as both modules define overlapping classes. {{Util}} is not the only class that appears to be defined by both, just what I have been focusing on due to this bug. It appears these methods were introduced in at least 1.12: [https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7] was: I have been running into a bug due to {{parquet-format}} and {{parquet-format-structures}} both defining the {{org.apache.parquet.format.Util}} class but doing so inconsistently. Examples of this are several methods which include a {{BlockCipher}} parameter that are defined from {{parquet-format-structures}} but not {{{}parquet-format{}}}. While invoking code that happens to use these, such as {{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code will fail if the {{parquet-format}} happens to be loaded first on the classpath. Here is an example stack trace for a Scala Spark application. {code:java} Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, org.apache.parquet.format.BlockCipher$Decryptor, byte[])' at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530) ~[parquet_hadoop.jar:1.13.1] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478) ~[parquet_hadoop.jar:1.13.1] ... (my application code invoking the above) {code} Because of issues external to Parquet that I have yet to figure out (a complex Spark and dependency setup), my classpaths are not deterministically ordered and I am unable to pin the {{parquet-format-structures}} ahead hence why I'm chiming in about this. Nonetheless, this is a fairly prickly edge to run into as both modules define overlapping classes. {{Util}} is not the only class that appears to be defined by both, just what I have been focusing on due to this bug. It appears these methods were introduced in at least 1.12: [https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7] > parquet-format and parquet-format-structures defines Util with inconsitent > methods provided > ------------------------------------------------------------------------------------------- > > Key: PARQUET-2317 > URL: https://issues.apache.org/jira/browse/PARQUET-2317 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Affects Versions: 1.12.0, 1.13.0 > Reporter: Joey Pereira > Priority: Major > > I have been running into a bug due to {{parquet-format}} and > {{parquet-format-structures}} both defining the > {{org.apache.parquet.format.Util}} class but doing so inconsistently. > Examples of this are several methods which include a {{BlockCipher}} > parameter that are defined from {{parquet-format-structures}} but not > {{{}parquet-format{}}}. While invoking code that happens to use these, such > as {{{}org.apache.parquet.hadoop.ParquetFileReader.readFooter{}}}, the code > will fail if the {{parquet-format}} happens to be loaded first on the > classpath. > Here is an example stack trace for a Scala Spark application. > {code:java} > Caused by: java.lang.NoSuchMethodError: > 'org.apache.parquet.format.FileMetaData > org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, > org.apache.parquet.format.BlockCipher$Decryptor, byte[])' > at > org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530) > ~[parquet_hadoop.jar:1.13.1] > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478) > ~[parquet_hadoop.jar:1.13.1] > ... (my application code invoking the above) > {code} > Because of issues external to Parquet that I have yet to figure out (a > complex Spark and dependency setup), my classpaths are not deterministically > ordered and I am unable to pin the {{parquet-format-structures}} ahead hence > why I'm chiming in about this. > Even if that weren't the case, this is a fairly prickly edge to run into as > both modules define overlapping classes. {{Util}} is not the only class that > appears to be defined by both, just what I have been focusing on due to this > bug. > It appears these methods were introduced in at least 1.12: > [https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7] -- This message was sent by Atlassian Jira (v8.20.10#820010)