Qingyou Meng created ARROW-11263:
------------------------------------
Summary: [Rust] problem of Field nullable
Key: ARROW-11263
URL: https://issues.apache.org/jira/browse/ARROW-11263
Project: Apache Arrow
Issue Type: Bug
Reporter: Qingyou Meng
Quoting from section *Schema message*
[https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
{noformat}
Whether the field is semantically nullable. While this has no bearing on the
array's physical layout, many systems distinguish nullable and non-nullable
fields and we want to allow them to preserve this metadata to enable faithful
schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters
null array data from the field, data processor CAN continue or refuse to
process.
In current rust implementation, apart from read Fields from schema, we also
construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
* in datafusion, the nullable is determined by DF schema
* in arrow::array::StructArray::
try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual
data. This is error-prone if ArrayRef's null buffer are all 1s (built by
builder). The following test shows a bug:
{noformat}
#[test]
fn test_struct_bug() {
let ints: ArrayRef = Arc::new(Int32Array::from(vec![
Some(1),
Some(2),
Some(3),
]));
let array = StructArray::try_from(vec![("f1", ints.clone())])
.unwrap()
.data();
let arrays = vec![array.as_ref()];
let mut mutable = MutableArrayData::new(arrays, false, 0);
mutable.extend(0, 1, 3);
let data = mutable.freeze();
let array = StructArray::from(Arc::new(data));
let expected = StructArray::try_from(vec![
("f1", ints.slice(1, 2)),
])
.unwrap();
assert_eq!(array, expected);
}{noformat}
Conclusions:
* It's questionable to set Field's nullable according to data.
* Perhaps builders should set null buffer back to None when the buffer has all
bits set.
* StructArray::
TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some
with all bits set.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)