tustvold opened a new issue #1480:
URL: https://github.com/apache/arrow-rs/issues/1480
**Describe the bug**
Currently when a nested NullArray is written to parquet the definition
levels are such that not all values are actually null. This is currently masked
by bugs that prevent reading such arrays (fixes in flight) and also because
`NullArrayReader` ignores any value data that may be present.
**To Reproduce**
```
#[test]
fn foo() {
let null_field = Field::new("item", DataType::Null, true);
let list_field = Field::new("emptylist",
DataType::List(Box::new(null_field)), true);
let schema = Schema::new(vec![list_field]);
// Build [[], null, [null, null]]
let a_values = NullArray::new(2);
let a_value_offsets = arrow::buffer::Buffer::from(&[0, 0, 0,
2].to_byte_slice());
let a_list_data = ArrayData::builder(DataType::List(Box::new(Field::new(
"item",
DataType::Null,
true,
))))
.len(3)
.add_buffer(a_value_offsets)
.null_bit_buffer(Buffer::from(vec![0b00000101]))
.add_child_data(a_values.data().clone())
.build()
.unwrap();
let a = ListArray::from(a_list_data);
assert_eq!(a.is_valid(0), true);
assert_eq!(a.is_valid(1), false);
assert_eq!(a.is_valid(2), true);
assert_eq!(a.value(0).len(), 0);
assert_eq!(a.value(2).len(), 2);
assert_eq!(a.value(2).null_count(), 2);
let batch = RecordBatch::try_new(Arc::new(schema),
vec![Arc::new(a)]).unwrap();
let file = File::create("temp.parquet").unwrap();
let mut writer = ArrowWriter::try_new(file, batch.schema(),
None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
}
```
Then read the file using duckdb
```
duckdb.query(f"select * from 'temp.parquet'").fetchall()
[(None,), (None,), ([0, 0],)]
```
We mysteriously have a null array containing non-null data :scream:
**Expected behavior**
```
>>> duckdb.query(f"select * from 'temp.parquet'").fetchall()
[(None,), (None,), ([None, None],)]
```
**Additional context**
This is likely to be a consequence of a somewhat surprising quirk of
NullArrays which is that they don't actually contain a null bitmask, as they
don't contain any buffers at all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]