wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911
> @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are composed from well-formed child arrays which may be nullable. * Additionally, all array types, including all nested array types, have their own validity bitmap * In the case of union would indicate that the type of the child is not known. This seems algebraically consistent to me. We can decide to stipulate that union types never have non-valid values at the Union cell level, only at the child cell level. But then a union value cannot be "made null" by changing the validity bitmap of the Union. From a purely algebraic / design perspective it isn't great. From the get-go in the project I've been striving for algebraic consistency, e.g. enabling well-formed arrays to be composed to created composite types without alteration. Since Unions are one of the more seldom used part of the project we can decide to explicitly deviate from that, but we need to do it now if we're going to do that and not delay longer in reconciling the issue. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org