wesm edited a comment on pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911


   > @wesm why would we have validity at both the top level and the inner level?
   
   Well, the way the specification is written
   
   * _All_ nested types including union are composed from well-formed child 
arrays which may have null values. 
   * Additionally, all array types, including all nested array types, have 
their own validity bitmap
   * In the case of union, a null at the top level would indicate that the type 
of the child is not known. This seems algebraically consistent to me. 
   
   We can decide to stipulate that union types never have non-valid values at 
the Union cell level, only at the child cell level. But then a union value 
cannot be "made null" by changing the validity bitmap of the Union. From a 
purely algebraic / design perspective it isn't great. From the get-go in the 
project I've been striving for algebraic consistency, e.g. enabling well-formed 
arrays to be composed to created composite types without alteration. Since 
Unions are one of the more seldom used part of the project we can decide to 
explicitly deviate from that, but we need to do it now if we're going to do 
that and not delay longer in reconciling the issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to