Do other C++ developers have an opinion here?
On Thu, 29 Jul 2021 12:58:08 +0200 Antoine Pitrou <anto...@python.org> wrote: > Hello, > > The Scalar base class has a `bool is_valid` member that is used to > represent null scalars for all types (including the null type). > > A UnionScalar, since it inherits from Scalar, has the following de facto > structure: > > { > // whether the scalar is null > bool is_valid; > // the underlying union value (only set if `is_valid` is true) > std::shared_ptr<Scalar> value; > } > > However, union arrays don't have a top-level validity bitmap. A null in > an union array is always encoded as a "valid" top-level entry with a > null entry in the chosen child array. > > Therefore, there are two possible conventions for representing null > union scalars: > > 1) set `scalar.is_valid` to false and `scalar.value` to nullptr > > 2) set `scalar.is_valid` to true and `scalar.value` to a null scalar for > one the union's child types > > Advantages / drawbacks of convention 1 > -------------------------------------- > > + consistency: `MakeNullScalar(union type)` always returns a scalar with > `is_valid` set to false > > - `union_array.GetScalar(i)` can return a null scalar even when > `union_array.IsNull(i)` would return false > > Advantages / drawbacks of convention 2 > -------------------------------------- > > + more congruent with the union datatype definition > > + `union_array.GetScalar(i)` always returns a valid scalar, just like > `union_array.IsNull(i)` always returns false > > - `MakeNullScalar(union type)` returns a valid scalar (or should it > return an error??) > > - breaks compatibility, since the current behaviour follows convention 1 > (but UnionScalar is probably not widely used...) > > > This came in the context of https://github.com/apache/arrow/pull/10817, > but is unrelated to the changes in this PR. > > Thoughts? Opinions? > > Regards > > Antoine. >