Since we can have duplicate types in a union, having UnionScalar to disambiguate the right type code would still be useful, I think.
-David On Thu, Jul 29, 2021, at 14:51, Wes McKinney wrote: > Sort of a related question is whether UnionScalar needs to exist at > all, versus returning the "unboxed" scalar from the corresponding > array to the requested value slot (e.g. if it's a union of int and > string, return either IntXScalar or StringScalar) > > On Thu, Jul 29, 2021 at 10:37 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Le 29/07/2021 à 14:25, Benjamin Kietzman a écrit : > > > Convention 2 seems more correct to me; if a UnionArray cannot > > > contain top level nulls then a UnionScalar should not be nullable. > > > Furthermore, I think that it's reasonable for > > > `MakeNullScalar(some_union_type)->is_valid` to be true, though > > > the doccomment for both `MakeNullScalar` and `MakeArrayOfNull` > > > should include explicit warnings of the special case which unions > > > represent. > > > > > > It's worth noting that convention 1 doesn't round trip through the > > > scalar. Consider the type > > > > > > t = dense_union({field("i", int8()), field("b", boolean())}, > > > /*type_codes=*/{4, 8}); > > > > > > If we define an array of this type with a single element like so: > > > > > > a = ArrayFromJSON(t, R"([{"type_code": 8, "value": null}])"); > > > > > > then the scalar returned by `a.GetScalar(0)` is one of: > > > > > > conv_1 = { .is_valid = false, .value = nullptr } > > > conv_2 = { .is_valid = true, .value = MakeNullScalar(boolean()) } > > > > > > ... on broadcasting `conv_2` to an array with `MakeArrayFromScalar` we > > > produce a correctly round tripped array with type_codes of 8. On the other > > > hand, broadcasting `conv_1` produces > > > > > > ArrayFromJSON(t, R"([{"type_code": 4, "value": null}])"); > > > > > > (In general replacing the type code with whichever type code was > > > declared first.) > > > > Note that https://github.com/apache/arrow/pull/10817 should hopefully > > fix this by adding a `type_code` field to UnionScalar. > > > > Regards > > > > Antoine. >