[GitHub] [arrow-rs] hohav commented on issue #385: Panic when writing Parquet from non-nullable ListArray

GitBox Sun, 27 Jun 2021 17:25:59 -0700


hohav commented on issue #385:
URL: https://github.com/apache/arrow-rs/issues/385#issuecomment-869250321



   I think there may be a more fundamental issue with `ListArray`. I created a 
new version of my repro 
[here](https://github.com/hohav/arrow-parquet-list-test/tree/v2), where I 
create a very simple ListArray: `[[1], [], [2]]`. I can successfully write this 
to a Parquet file using `ArrowWriter`, but then `parquet meta` shows incorrect 
information:
   ```
   $ parquet meta test.parquet 
   
   File path:  test.parquet
   Created by: parquet-rs version 5.0.0-SNAPSHOT (build 
de62168a4f428e3c334e1cfa5c5db23272f313d7)
   Properties:
     ARROW:schema: 
/////7gAAAAQAAAAAAAKAA4ADAALAAQACgAAABQAAAAAAAABBAAKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAEAAAAEAAAA3P///xwAAAAMAAAAAAABDFwAAAABAAAAHAAAAAQABAAEAAAAEAAUABAADgAPAAQAAAAIABAAAAAYAAAAIAAAAAAAAQIcAAAACAAMAAQACwAIAAAAIAAAAAAAAAEAAAAABAAAAGl0ZW0AAAAABgAAAHZhbHVlcwAA
   Schema:
   message arrow_schema {
     optional group values (LIST) {
       repeated group list {
         optional int32 item;
       }
     }
   }
   
   
   Row group 0:  count: 3  23.67 B records  start: 4  total: 71 B
   
--------------------------------------------------------------------------------
                     type      encodings count     avg size   nulls   min / max
   values.list.item  INT32     _ RR_     3         23.67 B    1       "1" / "2"
   ```
   Notice `nulls 1`, which AFAICT is incorrect: there are no null items, only 
one empty list. And `parquet cat` fails entirely:
   ```
   $ parquet cat test.parquet 
   Unknown error
   java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:155)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:185)
   Caused by: java.lang.ClassCastException: optional int32 item is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at 
org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
        at 
org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
        at 
org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at 
org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
        at 
org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
        at 
org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] hohav commented on issue #385: Panic when writing Parquet from non-nullable ListArray

Reply via email to