[ 
https://issues.apache.org/jira/browse/PARQUET-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres updated PARQUET-1356:
----------------------------
    Description: 
Hi all

After getting some error in my custom implementation, I was trying to run a 
test case copied from 
[here|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestReadWrite.java]
 and I surprisingly got the same.
{code:java}
val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": 
\"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n 
\"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}")
val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
tmp.deleteOnExit();
tmp.delete();
val file = new Path(tmp.getPath());
val testConf = new Configuration();
val writer = AvroParquetWriter
 .builder[GenericRecord](file)
 .withSchema(schema)
 .withConf(testConf)
 .build();

// Write a record with an empty array.
val emptyArray = new util.ArrayList[Integer]();
val record = new GenericRecordBuilder(schema)
 .set("myarray", emptyArray).build();
writer.write(record);
writer.close();

val reader = new AvroParquetReader[GenericRecord](testConf, file);
val nextRecord = reader.read(){code}
The project is scala + sbt with dependencies as follow

lazy val parquetVersion = "1.10.0"

lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion
 lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % 
Version.parquetVersion

And this is the stack trace:

 
{code:java}
Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
SIGNED_INT32_COMPARATOR (39 milliseconds)
[info] org.apache.parquet.column.statistics.StatisticsClassException: 
Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
SIGNED_INT32_COMPARATOR
[info] at 
org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
[info] at 
org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
[info] at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
[info] at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
[info] at 
org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
[info] at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
[info] at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
[info] at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
[info] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
[info] at 
uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
 
{code}
As you can see, this is confusing. The error is itself strange because the 
mismatch doesn't happen at all. Would really appreciate help with this issue.

Thanks

  was:
Hi all

After getting some error in my custom implementation, I was trying to run a 
test case copied from 
[here|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestReadWrite.java]
 and I surprisingly got the same.
{code:java}
val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": 
\"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n 
\"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}")
val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
tmp.deleteOnExit();
tmp.delete();
val file = new Path(tmp.getPath());
val testConf = new Configuration();
val writer = AvroParquetWriter
 .builder[GenericRecord](file)
 .withSchema(schema)
 .withConf(testConf)
 .build();

// Write a record with an empty array.
val emptyArray = new util.ArrayList[Integer]();
val record = new GenericRecordBuilder(schema)
 .set("myarray", emptyArray).build();
writer.write(record);
writer.close();

val reader = new AvroParquetReader[GenericRecord](testConf, file);
val nextRecord = reader.read(){code}
The project is scala + sbt with dependencies as follow

lazy val parquetVersion = "1.10.0"

lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion
lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % 
Version.parquetVersion

And this is the stack trace:

 
{code:java}
Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
SIGNED_INT32_COMPARATOR (39 milliseconds)
[info] org.apache.parquet.column.statistics.StatisticsClassException: 
Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
SIGNED_INT32_COMPARATOR
[info] at 
org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
[info] at 
org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
[info] at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
[info] at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
[info] at 
org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
[info] at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
[info] at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
[info] at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
[info] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
[info] at 
uk.co.santander.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
 
{code}
As you can see, this is confusing. The error is itself strange because the 
mismatch doesn't happen at all. Would really appreciate help with this issue.

Thanks


> Error when closing writer - Statistics comparator mismatched
> ------------------------------------------------------------
>
>                 Key: PARQUET-1356
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1356
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>         Environment: Mac OS Sierra 10.12.6
> IntelliJ 2018.1.6
> sbt 0.1
> scala-sdk-2.12.4
> java jdk1.8.0_144
>            Reporter: Andres
>            Priority: Blocker
>
> Hi all
> After getting some error in my custom implementation, I was trying to run a 
> test case copied from 
> [here|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestReadWrite.java]
>  and I surprisingly got the same.
> {code:java}
> val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": 
> \"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n 
> \"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}")
> val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
> tmp.deleteOnExit();
> tmp.delete();
> val file = new Path(tmp.getPath());
> val testConf = new Configuration();
> val writer = AvroParquetWriter
>  .builder[GenericRecord](file)
>  .withSchema(schema)
>  .withConf(testConf)
>  .build();
> // Write a record with an empty array.
> val emptyArray = new util.ArrayList[Integer]();
> val record = new GenericRecordBuilder(schema)
>  .set("myarray", emptyArray).build();
> writer.write(record);
> writer.close();
> val reader = new AvroParquetReader[GenericRecord](testConf, file);
> val nextRecord = reader.read(){code}
> The project is scala + sbt with dependencies as follow
> lazy val parquetVersion = "1.10.0"
> lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion
>  lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % 
> Version.parquetVersion
> And this is the stack trace:
>  
> {code:java}
> Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
> SIGNED_INT32_COMPARATOR (39 milliseconds)
> [info] org.apache.parquet.column.statistics.StatisticsClassException: 
> Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. 
> SIGNED_INT32_COMPARATOR
> [info] at 
> org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
> [info] at 
> org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
> [info] at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
> [info] at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
> [info] at 
> org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
> [info] at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
> [info] at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
> [info] at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
> [info] at 
> org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
> [info] at 
> uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
>  
> {code}
> As you can see, this is confusing. The error is itself strange because the 
> mismatch doesn't happen at all. Would really appreciate help with this issue.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to