Thanks for reaching out Øyvind. Which version of Parquet are you using? Would it be possible to open a ticket, and attach the person.parquet file to? I don't see anything weird in your schema or code.
Cheers, Fokko Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <[email protected]>: > Hi, > > Please see code below that reproduces the scenario: > > Schema schema = new Schema.Parser().parse(""" > { > "type": "record", > "name": "person", > "fields": [ > { > "name": "address", > "type": [ > "null", > { > "type": "array", > "items": "string" > } > ], > "default": null > } > ] > } > """ > ); > > ParquetWriter<GenericRecord> writer = > AvroParquetWriter.<GenericRecord>builder(new > org.apache.hadoop.fs.Path("/tmp/person.parquet")) > .withSchema(schema) > .build(); > > try { > writer.write(new GenericRecordBuilder(schema).set("address", > Arrays.asList("first", null, "last")).build()); > } catch (Exception e) { > e.printStackTrace(); > } > > try { > writer.write(new GenericRecordBuilder(schema).set("address", > Collections.singletonList("first")).build()); > } catch (Exception e) { > e.printStackTrace(); > } > > > The first call to AvroParquetWriter#write attempts to add an array with a > null element and fails - as expected - with > "java.lang.NullPointerException: > Array contains a null element at 1". However, at this point all subsequent > calls (with valid records) to AvroParquetWriter#write will fail with > "org.apache.parquet.io.InvalidRecordException: > 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer > isn't > being reset between writes. > > Is this the indented behavior of the writer? And if so, does one have to > create a new writer whenever a write fails? > > Best Regards, > Øyvind Strømmen >
