Do other C++ developers have an opinion here?



On Thu, 29 Jul 2021 12:58:08 +0200
Antoine Pitrou <anto...@python.org> wrote:
> Hello,
> 
> The Scalar base class has a `bool is_valid` member that is used to 
> represent null scalars for all types (including the null type).
> 
> A UnionScalar, since it inherits from Scalar, has the following de facto 
> structure:
> 
> {
>    // whether the scalar is null
>    bool is_valid;
>    // the underlying union value (only set if `is_valid` is true)
>    std::shared_ptr<Scalar> value;
> }
> 
> However, union arrays don't have a top-level validity bitmap.  A null in 
> an union array is always encoded as a "valid" top-level entry with a 
> null entry in the chosen child array.
> 
> Therefore, there are two possible conventions for representing null 
> union scalars:
> 
> 1) set `scalar.is_valid` to false and `scalar.value` to nullptr
> 
> 2) set `scalar.is_valid` to true and `scalar.value` to a null scalar for 
> one the union's child types
> 
> Advantages / drawbacks of convention 1
> --------------------------------------
> 
> + consistency: `MakeNullScalar(union type)` always returns a scalar with 
> `is_valid` set to false
> 
> - `union_array.GetScalar(i)` can return a null scalar even when 
> `union_array.IsNull(i)` would return false
> 
> Advantages / drawbacks of convention 2
> --------------------------------------
> 
> + more congruent with the union datatype definition
> 
> + `union_array.GetScalar(i)` always returns a valid scalar, just like 
> `union_array.IsNull(i)` always returns false
> 
> - `MakeNullScalar(union type)` returns a valid scalar (or should it 
> return an error??)
> 
> - breaks compatibility, since the current behaviour follows convention 1 
> (but UnionScalar is probably not widely used...)
> 
> 
> This came in the context of https://github.com/apache/arrow/pull/10817, 
> but is unrelated to the changes in this PR.
> 
> Thoughts? Opinions?
> 
> Regards
> 
> Antoine.
> 



Reply via email to