[ 
https://issues.apache.org/jira/browse/PARQUET-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164478#comment-17164478
 ] 

Fokko Driesprong commented on PARQUET-1887:
-------------------------------------------

Flushing it to disk feels a bit harsh to me, if it happens often, then you 
would flush very often. 

> Exception thrown by AvroParquetWriter#write causes all subsequent calls to it 
> to fail
> -------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1887
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1887
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.11.0, 1.8.3
>            Reporter: Øyvind Strømmen
>            Priority: Major
>         Attachments: person1_11_0.parquet, person1_8_3.parquet
>
>
> Please see sample code below:
> {code:java}
> Schema schema = new Schema.Parser().parse("""
>         {
>           "type": "record",
>           "name": "person",
>           "fields": [
>             {
>               "name": "address",
>               "type": [
>                 "null",
>                 {
>                   "type": "array",
>                   "items": "string"
>                 }
>               ],
>               "default": null
>             }
>           ]
>         }
>         """
> );
> ParquetWriter<GenericRecord> writer = 
> AvroParquetWriter.<GenericRecord>builder(new 
> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>         .withSchema(schema)
>         .build();
> try {
>     // To trigger exception, add array with null element.
>     writer.write(new GenericRecordBuilder(schema).set("address", 
> Arrays.asList("first", null, "last")).build());
> } catch (Exception e) {
>     e.printStackTrace(); // "java.lang.NullPointerException: Array contains a 
> null element at 1"
> }
> try {
>     // At this point all future calls to writer.write will fail
>     writer.write(new GenericRecordBuilder(schema).set("address", 
> Arrays.asList("foo", "bar")).build());
> } catch (Exception e) {
>     e.printStackTrace(); // "org.apache.parquet.io.InvalidRecordException: 
> 1(r) > 0 ( schema r)"
> }
> writer.close();
> {code}
> It seems to me this is caused by state not being reset between writes. Is 
> this the indented behavior of the writer? And if so, does one have to create 
> a new writer whenever a write fails?
> I'm able to reproduce this using both parquet 1.8.3 and 1.11.0, and have 
> attached a sample parquet file for each version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to