Thanks for reaching out Øyvind.

Which version of Parquet are you using? Would it be possible to open a
ticket, and attach the person.parquet file to? I don't see anything weird
in your schema or code.

Cheers, Fokko




Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <[email protected]>:

> Hi,
>
> Please see code below that reproduces the scenario:
>
> Schema schema = new Schema.Parser().parse("""
>   {
>     "type": "record",
>     "name": "person",
>     "fields": [
>       {
>         "name": "address",
>         "type": [
>           "null",
>           {
>             "type": "array",
>             "items": "string"
>           }
>         ],
>         "default": null
>       }
>     ]
>   }
> """
> );
>
>  ParquetWriter<GenericRecord> writer =
> AvroParquetWriter.<GenericRecord>builder(new
> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>   .withSchema(schema)
>   .build();
>
> try {
>   writer.write(new GenericRecordBuilder(schema).set("address",
> Arrays.asList("first", null, "last")).build());
> } catch (Exception e) {
>   e.printStackTrace();
> }
>
> try {
>   writer.write(new GenericRecordBuilder(schema).set("address",
> Collections.singletonList("first")).build());
> } catch (Exception e) {
>   e.printStackTrace();
> }
>
>
> The first call to AvroParquetWriter#write attempts to add an array with a
> null element and fails - as expected - with
> "java.lang.NullPointerException:
> Array contains a null element at 1". However, at this point all subsequent
> calls (with valid records) to AvroParquetWriter#write will fail with
> "org.apache.parquet.io.InvalidRecordException:
> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
> isn't
> being reset between writes.
>
> Is this the indented behavior of the writer? And if so, does one have to
> create a new writer whenever a write fails?
>
> Best Regards,
> Øyvind Strømmen
>

Reply via email to