I’ll change my vote to a +1.
If I run the tests using -pl :parquet-hadoop, they pass, as do the rest of
the tests. That makes it harder to debug, but gives me confidence. The
error appears to be with the Parquet build and not the code, so I think
this won’t affect downstream users. And Iceberg tests passing helps
validate that.
The error itself is this:
java.lang.Exception: java.lang.NoSuchMethodError:
org.apache.parquet.format.LogicalType.getSetField()Lshaded/parquet/org/apache/thrift/TFieldIdEnum;
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NoSuchMethodError:
org.apache.parquet.format.LogicalType.getSetField()Lshaded/parquet/org/apache/thrift/TFieldIdEnum;
at
org.apache.parquet.format.converter.ParquetMetadataConverter.getLogicalTypeAnnotation(ParquetMetadataConverter.java:972)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1331)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1286)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1204)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1198)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:546)
at
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:712)
at
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:609)
at
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
LogicalType is a Parquet class, but the missing method is from thrift. I
think there must be a bad version of thrift in the test classpath
somewhere, but it doesn’t appear to be in the normal classpath.
On Wed, Aug 5, 2020 at 5:00 AM Gabor Szadovszky <[email protected]> wrote:
> Hi Ryan,
>
> Did you have time to investigate this issue? Is it reproducible with the
> 1.11.0 release as well?
>
> Any other voters? We only have two binding votes and one is a +0.
>
> Thanks a lot,
> Gabor
>
> On Fri, Jul 31, 2020 at 9:40 AM Gabor Szadovszky <[email protected]> wrote:
>
> > Hi Ryan,
> >
> > I have no idea. We are using thrift 0.12.0 on master since 1.5yrs and I
> > haven't experienced any issues with it in my environment (Linux) nor
> have I
> > met one in Travis builds.
> > Has anyone else experienced similar issues?
> >
> > Thanks,
> > Gabor
> >
> > On Fri, Jul 31, 2020 at 1:48 AM Ryan Blue <[email protected]>
> > wrote:
> >
> >> +0 for now
> >>
> >> - Tested downstream Iceberg against 1.11.1 in
> >>
> >>
> https://repository.apache.org/content/repositories/orgapacheparquet-1031
> >> - Verified signature, checksum
> >> - Built and ran tests
> >>
> >> The parquet-hadoop tests are failing with errors like this, indicating
> the
> >> wrong version of Thrift:
> >>
> >>
> >>
> testSimpleFiltering[0](org.apache.parquet.hadoop.TestColumnIndexFiltering):
> >>
> >>
> org.apache.parquet.format.LogicalType.getSetField()Lshaded/parquet/org/apache/thrift/TFieldIdEnum;
> >>
> >> Any idea what’s happening? My thrift executable reports 0.12.0 and that
> >> matches the thrift.version in the POM.
> >>
> >> On Wed, Jul 29, 2020 at 3:09 PM Gara Walid <[email protected]> wrote:
> >>
> >> > +1 (non-binding)
> >> >
> >> > - Verified the signature.
> >> > - Verified the checksum.
> >> > - Built the source from the tarball and ran tests.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Le mer. 29 juil. 2020 à 11:25, Gabor Szadovszky <[email protected]> a
> >> > écrit :
> >> >
> >> > > +1 (binding)
> >> > >
> >> > > On Wed, Jul 29, 2020 at 10:23 AM Gabor Szadovszky <[email protected]
> >
> >> > > wrote:
> >> > >
> >> > > > Hi everyone,
> >> > > >
> >> > > > I propose the following RC to be released as the official Apache
> >> > Parquet
> >> > > > 1.11.1 release.
> >> > > >
> >> > > > The commit id is 765bd5cd7fdef2af1cecd0755000694b992bfadd
> >> > > > * This corresponds to the tag: apache-parquet-1.11.1-rc1
> >> > > > *
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/parquet-mr/tree/765bd5cd7fdef2af1cecd0755000694b992bfadd
> >> > > >
> >> > > > The release tarball, signature, and checksums are here:
> >> > > > *
> >> > >
> >>
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.11.1-rc1
> >> > > >
> >> > > > You can find the KEYS file here:
> >> > > > * https://downloads.apache.org/parquet/KEYS
> >> > > >
> >> > > > Binary artifacts are staged in Nexus here:
> >> > > > *
> >> > >
> >>
> https://repository.apache.org/content/groups/staging/org/apache/parquet/
> >> > > >
> >> > > > This release includes changes listed at
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1-rc1/CHANGES.md
> >> > > > .
> >> > > >
> >> > > > Please download, verify, and test.
> >> > > >
> >> > > > Please vote in the next 72 hours.
> >> > > >
> >> > > > [ ] +1 Release this as Apache Parquet 1.11.1
> >> > > > [ ] +0
> >> > > > [ ] -1 Do not release this because...
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Netflix
> >>
> >
>
--
Ryan Blue
Software Engineer
Netflix