nevi-me opened a new issue #282: URL: https://github.com/apache/arrow-rs/issues/282
**Describe the bug** First documented in https://github.com/apache/arrow-rs/pull/270#issuecomment-836762589. When trying to write some combinations of nested Arrow data to Parquet, we trigger a bounds error on the level calculations. The most obvious thing that could be going wrong is that we're not correctly accounting for empty list slot vs null list slot. This is because the error gets triggered around the logic that does this. **To Reproduce** Try the below test: ```rust #[test] fn test_write_ipc_nested_lists() { let fields = vec![Field::new( "list_a", DataType::List(Box::new(Field::new( "list_b", DataType::List(Box::new(Field::new( "struct_c", DataType::Struct(vec![ Field::new("prim_d", DataType::Boolean, true), Field::new( "list_e", DataType::LargeList(Box::new(Field::new( "string_f", DataType::LargeUtf8, true, ))), false, ), ]), true, ))), false, ))), true, )]; let schema = Arc::new(Schema::new(fields)); // making this nullable guarantees that one of the list items will be empty, triggering the error let batch = arrow::util::data_gen::create_random_batch(schema, 3, 0.35, 0.6).unwrap(); // write ipc (to read in pyarrow, and write parquet from pyarrow) let file = File::create("arrow_nested_random.arrow").unwrap(); let mut writer = arrow::ipc::writer::FileWriter::try_new(file, batch.schema().as_ref()).unwrap(); writer.write(&batch).unwrap(); writer.finish().unwrap(); let file = File::create("arrow_nested_random_rust.parquet").unwrap(); let mut writer = ArrowWriter::try_new(file.try_clone().unwrap(), batch.schema(), None) .expect("Unable to write file"); // this will trigger the error in question writer.write(&batch).unwrap(); writer.close().unwrap(); } ``` **Expected behavior** The parquet file should be written correctly, and pyarrow or Spark should be able to read the data correctly. **Additional context** Not sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
