hohav commented on issue #385: URL: https://github.com/apache/arrow-rs/issues/385#issuecomment-869250321
I think there may be a more fundamental issue with `ListArray`. I created a new version of my repro [here](https://github.com/hohav/arrow-parquet-list-test/tree/v2), where I create a very simple ListArray: `[[1], [], [2]]`. I can successfully write this to a Parquet file using `ArrowWriter`, but then `parquet meta` shows incorrect information: ``` $ parquet meta test.parquet File path: test.parquet Created by: parquet-rs version 5.0.0-SNAPSHOT (build de62168a4f428e3c334e1cfa5c5db23272f313d7) Properties: ARROW:schema: /////7gAAAAQAAAAAAAKAA4ADAALAAQACgAAABQAAAAAAAABBAAKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAEAAAAEAAAA3P///xwAAAAMAAAAAAABDFwAAAABAAAAHAAAAAQABAAEAAAAEAAUABAADgAPAAQAAAAIABAAAAAYAAAAIAAAAAAAAQIcAAAACAAMAAQACwAIAAAAIAAAAAAAAAEAAAAABAAAAGl0ZW0AAAAABgAAAHZhbHVlcwAA Schema: message arrow_schema { optional group values (LIST) { repeated group list { optional int32 item; } } } Row group 0: count: 3 23.67 B records start: 4 total: 71 B -------------------------------------------------------------------------------- type encodings count avg size nulls min / max values.list.item INT32 _ RR_ 3 23.67 B 1 "1" / "2" ``` Notice `nulls 1`, which AFAICT is incorrect: there are no null items, only one empty list. And `parquet cat` fails entirely: ``` $ parquet cat test.parquet Unknown error java.lang.RuntimeException: Failed on record 0 at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86) at org.apache.parquet.cli.Main.run(Main.java:155) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.parquet.cli.Main.main(Main.java:185) Caused by: java.lang.ClassCastException: optional int32 item is not a group at org.apache.parquet.schema.Type.asGroupType(Type.java:248) at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284) at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228) at org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74) at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539) at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489) at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293) at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137) at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91) at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363) at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344) at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342) at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73) ... 3 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
