Le 29/07/2021 à 14:25, Benjamin Kietzman a écrit :
Convention 2 seems more correct to me; if a UnionArray cannot
contain top level nulls then a UnionScalar should not be nullable.
Furthermore, I think that it's reasonable for
`MakeNullScalar(some_union_type)->is_valid` to be true, though
the doccomment for both `MakeNullScalar` and `MakeArrayOfNull`
should include explicit warnings of the special case which unions
represent.

It's worth noting that convention 1 doesn't round trip through the
scalar. Consider the type

     t = dense_union({field("i", int8()), field("b", boolean())},
/*type_codes=*/{4, 8});

If we define an array of this type with a single element like so:

     a = ArrayFromJSON(t, R"([{"type_code": 8, "value": null}])");

then the scalar returned by `a.GetScalar(0)` is one of:

     conv_1 = { .is_valid = false, .value = nullptr }
     conv_2 = { .is_valid = true, .value = MakeNullScalar(boolean()) }

... on broadcasting `conv_2` to an array with `MakeArrayFromScalar` we
produce a correctly round tripped array with type_codes of 8. On the other
hand, broadcasting `conv_1` produces

     ArrayFromJSON(t, R"([{"type_code": 4, "value": null}])");

(In general replacing the type code with whichever type code was
declared first.)

Note that https://github.com/apache/arrow/pull/10817 should hopefully fix this by adding a `type_code` field to UnionScalar.

Regards

Antoine.

Reply via email to