Hi,

As discussed on the mailing list [1], it has been proposed to remove
the validity bitmap buffer from Union types in the columnar format
specification and instead let value validity be determined exclusively
by constituent arrays of the union.

One of the primary motivations for this is to simplify the creation of
unions, since constructing a validity bitmap that merges the
information contained in the child arrays' bitmaps is quite
complicated.

Note that change breaks IPC forward compatibility for union types,
however implementations with hitherto spec-compliant union
implementations would be able to (at their discretion, of course)
preserve backward compatibility for deserializing "old" union data in
the case that the parent null count of the union is zero. The expected
impact of this breakage is low, particularly given that Unions have
been absent from integration testing and thus not recommended for
anything but ephemeral serialization.

Under the assumption that the MetadataVersion V4 -> V5 version bump is
accepted, in order to protect against forward compatibility problems,
Arrow implementations would be forbidden from serializing union types
using the MetadataVersion::V4.

A PR with the changes to Columnar.rst is at [2].

The vote will be open for at least 72 hours.

[ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
[ ] +0
[ ] -1 Do not accept changes because...

[1]: 
https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
[2]: https://github.com/apache/arrow/pull/7535

Reply via email to