[ https://issues.apache.org/jira/browse/ARROW-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-11271: ----------------------------------- Labels: pull-request-available (was: ) > [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability > ------------------------------------------------------------------------------ > > Key: ARROW-11271 > URL: https://issues.apache.org/jira/browse/ARROW-11271 > Project: Apache Arrow > Issue Type: Bug > Components: Rust > Affects Versions: 2.0.0 > Reporter: Neville Dipale > Assignee: Neville Dipale > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently do not propagate child nullability correctly when reading > parquet files from Spark 3.0.1 (parquet-mr 1.10.1). > For example, the below taken from > [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] is > currently interpreted incorrectly: > > {code:java} > // List<String> (list nullable, elements non-null) > optional group my_list (LIST) { > repeated group list { > required binary element (UTF8); > } > }{code} > The Arrow type should be: > {code:java} > Field::new( > "my_list", > DataType::List( > box Field::new("element", DataType::Utf8, nullable: false), > ), > nullable: true > ){code} > but we currently end up with > {code:java} > Field::new( > "my_list", > DataType::List( > box Field::new("list", DataType::Utf8, nullable: true), > ), > nullable: true > ) > {code} > This doesn't seem to be an issue with the master branch as of opening this > issue, so it might not be severe enough to try force into the 3.0.0 release. > I tested null and non-null Spark files, and was able to read them correctly. > This becomes an issue with nested lists, which I'm working on. > -- This message was sent by Atlassian Jira (v8.3.4#803005)