I've added an answer to the Ticket:
https://issues.apache.org/jira/browse/PARQUET-1887

And created a PR for who's interested:
https://github.com/apache/parquet-mr/pull/804

Cheers, Fokko

Op wo 22 jul. 2020 om 12:29 schreef Driesprong, Fokko <[email protected]
>:

> Thanks for reaching out Øyvind.
>
> Which version of Parquet are you using? Would it be possible to open a
> ticket, and attach the person.parquet file to? I don't see anything weird
> in your schema or code.
>
> Cheers, Fokko
>
>
>
>
> Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <[email protected]>:
>
>> Hi,
>>
>> Please see code below that reproduces the scenario:
>>
>> Schema schema = new Schema.Parser().parse("""
>>   {
>>     "type": "record",
>>     "name": "person",
>>     "fields": [
>>       {
>>         "name": "address",
>>         "type": [
>>           "null",
>>           {
>>             "type": "array",
>>             "items": "string"
>>           }
>>         ],
>>         "default": null
>>       }
>>     ]
>>   }
>> """
>> );
>>
>>  ParquetWriter<GenericRecord> writer =
>> AvroParquetWriter.<GenericRecord>builder(new
>> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>>   .withSchema(schema)
>>   .build();
>>
>> try {
>>   writer.write(new GenericRecordBuilder(schema).set("address",
>> Arrays.asList("first", null, "last")).build());
>> } catch (Exception e) {
>>   e.printStackTrace();
>> }
>>
>> try {
>>   writer.write(new GenericRecordBuilder(schema).set("address",
>> Collections.singletonList("first")).build());
>> } catch (Exception e) {
>>   e.printStackTrace();
>> }
>>
>>
>> The first call to AvroParquetWriter#write attempts to add an array with a
>> null element and fails - as expected - with
>> "java.lang.NullPointerException:
>> Array contains a null element at 1". However, at this point all subsequent
>> calls (with valid records) to AvroParquetWriter#write will fail with
>> "org.apache.parquet.io.InvalidRecordException:
>> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
>> isn't
>> being reset between writes.
>>
>> Is this the indented behavior of the writer? And if so, does one have to
>> create a new writer whenever a write fails?
>>
>> Best Regards,
>> Øyvind Strømmen
>>
>

Reply via email to