Le 29/07/2021 à 14:25, Benjamin Kietzman a écrit :
Convention 2 seems more correct to me; if a UnionArray cannot
contain top level nulls then a UnionScalar should not be nullable.
Furthermore, I think that it's reasonable for
`MakeNullScalar(some_union_type)->is_valid` to be true, though
the doccomment for both `MakeNullScalar` and `MakeArrayOfNull`
should include explicit warnings of the special case which unions
represent.
It's worth noting that convention 1 doesn't round trip through the
scalar. Consider the type
t = dense_union({field("i", int8()), field("b", boolean())},
/*type_codes=*/{4, 8});
If we define an array of this type with a single element like so:
a = ArrayFromJSON(t, R"([{"type_code": 8, "value": null}])");
then the scalar returned by `a.GetScalar(0)` is one of:
conv_1 = { .is_valid = false, .value = nullptr }
conv_2 = { .is_valid = true, .value = MakeNullScalar(boolean()) }
... on broadcasting `conv_2` to an array with `MakeArrayFromScalar` we
produce a correctly round tripped array with type_codes of 8. On the other
hand, broadcasting `conv_1` produces
ArrayFromJSON(t, R"([{"type_code": 4, "value": null}])");
(In general replacing the type code with whichever type code was
declared first.)
Note that https://github.com/apache/arrow/pull/10817 should hopefully
fix this by adding a `type_code` field to UnionScalar.
Regards
Antoine.