[ https://issues.apache.org/jira/browse/PARQUET-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558279#comment-16558279 ]
Andres commented on PARQUET-1356: --------------------------------- The bug apparently only happens in version 1.10.0 of the library. The same test using 1.9.0 works fine. > Error when closing writer - Statistics comparator mismatched > ------------------------------------------------------------ > > Key: PARQUET-1356 > URL: https://issues.apache.org/jira/browse/PARQUET-1356 > Project: Parquet > Issue Type: Bug > Components: parquet-avro > Environment: Mac OS Sierra 10.12.6 > IntelliJ 2018.1.6 > sbt 0.1 > scala-sdk-2.12.4 > java jdk1.8.0_144 > Reporter: Andres > Priority: Blocker > > Hi all > After getting some error in my custom implementation, I was trying to run a > test case copied from > [here|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestReadWrite.java] > and I surprisingly got the same. > {code:java} > val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": > \"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n > \"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}") > val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp"); > tmp.deleteOnExit(); > tmp.delete(); > val file = new Path(tmp.getPath()); > val testConf = new Configuration(); > val writer = AvroParquetWriter > .builder[GenericRecord](file) > .withSchema(schema) > .withConf(testConf) > .build(); > // Write a record with an empty array. > val emptyArray = new util.ArrayList[Integer](); > val record = new GenericRecordBuilder(schema) > .set("myarray", emptyArray).build(); > writer.write(record); > writer.close(); > val reader = new AvroParquetReader[GenericRecord](testConf, file); > val nextRecord = reader.read(){code} > The project is scala + sbt with dependencies as follow > > {code:java} > lazy val parquetVersion = "1.10.0" > lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion > lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % > Version.parquetVersion > {code} > > And this is the stack trace: > > {code:java} > Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. > SIGNED_INT32_COMPARATOR (39 milliseconds) > [info] org.apache.parquet.column.statistics.StatisticsClassException: > Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. > SIGNED_INT32_COMPARATOR > [info] at > org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42) > [info] at > org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327) > [info] at > org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119) > [info] at > org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147) > [info] at > org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) > [info] at > org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122) > [info] at > org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169) > [info] at > org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109) > [info] at > org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301) > [info] at > uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132) > > {code} > As you can see, this is confusing. The error is itself strange because the > mismatch doesn't happen at all. Would really appreciate help with this issue. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)