Qingyou Meng created ARROW-11263:
------------------------------------

             Summary: [Rust] problem of Field nullable
                 Key: ARROW-11263
                 URL: https://issues.apache.org/jira/browse/ARROW-11263
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Qingyou Meng


Quoting from section *Schema message*

 
[https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]

 
{noformat}
Whether the field is semantically nullable. While this has no bearing on the 
array's physical layout, many systems distinguish nullable and non-nullable 
fields and we want to allow them to preserve this metadata to enable faithful 
schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters 
null array data from the field, data processor CAN continue or refuse to 
process.

In current rust implementation, apart from read Fields from schema, we also 
construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
 * in datafusion, the nullable is determined by DF schema
 * in arrow::array::StructArray::

try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual 
data. This is error-prone if ArrayRef's null buffer are all 1s (built by 
builder). The following test shows a bug:
 
{noformat}
    #[test]
    fn test_struct_bug() {
        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
            Some(1),
            Some(2),
            Some(3),
        ]));
        let array = StructArray::try_from(vec![("f1", ints.clone())])           
 .unwrap()
            .data();
        let arrays = vec![array.as_ref()];
        let mut mutable = MutableArrayData::new(arrays, false, 0);
        mutable.extend(0, 1, 3);
        let data = mutable.freeze();
        let array = StructArray::from(Arc::new(data));
        let expected = StructArray::try_from(vec![
            ("f1", ints.slice(1, 2)),
        ])
        .unwrap();
        assert_eq!(array, expected);
    }{noformat}
Conclusions:
 * It's questionable to set Field's nullable according to data.
 * Perhaps builders should set null buffer back to None when the buffer has all 
bits set.
 * StructArray::
TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some 
with all bits set.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to