> > It's worth noting that convention 1 doesn't round trip

> Note that https://github.com/apache/arrow/pull/10817 should hopefully
> fix this by adding a `type_code` field to UnionScalar.

We'd need to add a special case to MakeArrayFromScalar, which currently
uses MakeArrayOfNull if !Scalar::is_valid
https://github.com/apache/arrow/blob/db4f23f2179540a490789e9715031578b4b91e16/cpp/src/arrow/array/util.cc#L741

However that's doable and then we would indeed have correct round
tripping under convention 1.



On Thu, Jul 29, 2021 at 11:37 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 29/07/2021 à 14:25, Benjamin Kietzman a écrit :
> > Convention 2 seems more correct to me; if a UnionArray cannot
> > contain top level nulls then a UnionScalar should not be nullable.
> > Furthermore, I think that it's reasonable for
> > `MakeNullScalar(some_union_type)->is_valid` to be true, though
> > the doccomment for both `MakeNullScalar` and `MakeArrayOfNull`
> > should include explicit warnings of the special case which unions
> > represent.
> >
> > It's worth noting that convention 1 doesn't round trip through the
> > scalar. Consider the type
> >
> >      t = dense_union({field("i", int8()), field("b", boolean())},
> > /*type_codes=*/{4, 8});
> >
> > If we define an array of this type with a single element like so:
> >
> >      a = ArrayFromJSON(t, R"([{"type_code": 8, "value": null}])");
> >
> > then the scalar returned by `a.GetScalar(0)` is one of:
> >
> >      conv_1 = { .is_valid = false, .value = nullptr }
> >      conv_2 = { .is_valid = true, .value = MakeNullScalar(boolean()) }
> >
> > ... on broadcasting `conv_2` to an array with `MakeArrayFromScalar` we
> > produce a correctly round tripped array with type_codes of 8. On the
> other
> > hand, broadcasting `conv_1` produces
> >
> >      ArrayFromJSON(t, R"([{"type_code": 4, "value": null}])");
> >
> > (In general replacing the type code with whichever type code was
> > declared first.)
>
> Note that https://github.com/apache/arrow/pull/10817 should hopefully
> fix this by adding a `type_code` field to UnionScalar.
>
> Regards
>
> Antoine.
>

Reply via email to