I've added an answer to the Ticket: https://issues.apache.org/jira/browse/PARQUET-1887
And created a PR for who's interested: https://github.com/apache/parquet-mr/pull/804 Cheers, Fokko Op wo 22 jul. 2020 om 12:29 schreef Driesprong, Fokko <[email protected] >: > Thanks for reaching out Øyvind. > > Which version of Parquet are you using? Would it be possible to open a > ticket, and attach the person.parquet file to? I don't see anything weird > in your schema or code. > > Cheers, Fokko > > > > > Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <[email protected]>: > >> Hi, >> >> Please see code below that reproduces the scenario: >> >> Schema schema = new Schema.Parser().parse(""" >> { >> "type": "record", >> "name": "person", >> "fields": [ >> { >> "name": "address", >> "type": [ >> "null", >> { >> "type": "array", >> "items": "string" >> } >> ], >> "default": null >> } >> ] >> } >> """ >> ); >> >> ParquetWriter<GenericRecord> writer = >> AvroParquetWriter.<GenericRecord>builder(new >> org.apache.hadoop.fs.Path("/tmp/person.parquet")) >> .withSchema(schema) >> .build(); >> >> try { >> writer.write(new GenericRecordBuilder(schema).set("address", >> Arrays.asList("first", null, "last")).build()); >> } catch (Exception e) { >> e.printStackTrace(); >> } >> >> try { >> writer.write(new GenericRecordBuilder(schema).set("address", >> Collections.singletonList("first")).build()); >> } catch (Exception e) { >> e.printStackTrace(); >> } >> >> >> The first call to AvroParquetWriter#write attempts to add an array with a >> null element and fails - as expected - with >> "java.lang.NullPointerException: >> Array contains a null element at 1". However, at this point all subsequent >> calls (with valid records) to AvroParquetWriter#write will fail with >> "org.apache.parquet.io.InvalidRecordException: >> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer >> isn't >> being reset between writes. >> >> Is this the indented behavior of the writer? And if so, does one have to >> create a new writer whenever a write fails? >> >> Best Regards, >> Øyvind Strømmen >> >
