Hi, As discussed on the mailing list [1], it has been proposed to remove the validity bitmap buffer from Union types in the columnar format specification and instead let value validity be determined exclusively by constituent arrays of the union.
One of the primary motivations for this is to simplify the creation of unions, since constructing a validity bitmap that merges the information contained in the child arrays' bitmaps is quite complicated. Note that change breaks IPC forward compatibility for union types, however implementations with hitherto spec-compliant union implementations would be able to (at their discretion, of course) preserve backward compatibility for deserializing "old" union data in the case that the parent null count of the union is zero. The expected impact of this breakage is low, particularly given that Unions have been absent from integration testing and thus not recommended for anything but ephemeral serialization. Under the assumption that the MetadataVersion V4 -> V5 version bump is accepted, in order to protect against forward compatibility problems, Arrow implementations would be forbidden from serializing union types using the MetadataVersion::V4. A PR with the changes to Columnar.rst is at [2]. The vote will be open for at least 72 hours. [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) [ ] +0 [ ] -1 Do not accept changes because... [1]: https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E [2]: https://github.com/apache/arrow/pull/7535