Thanks for taking the initiative on this!

As demonstrated by [1], the wish for an 8-bit Boolean extension type is
long-standing. I think this is a worthwhile addition to Arrow's canonical
extension types.

Before the vote, I would like to see verification that this truly enables
zero-copy to/from NumPy bool arrays in Python.

Ian

On Tue, Jul 16, 2024 at 7:29 AM Joel Lubinitsky <joell...@gmail.com> wrote:

> Hi Arrow devs,
>
> I'm working on adding an extension type for 8-bit booleans, and wanted to
> start a discussion about it here because it could be valuable to others if
> adopted as a canonical extension type.
>
> The native implementation of the Boolean type uses 1 bit to encode each
> value, enabling a very compact representation. This is favorable for many
> workloads, but lots of systems that want to produce/consume Boolean arrays
> use an 8-bit representation internally and are forced to copy/convert at
> their periphery. For these scenarios where zero-copy compatibility is
> important, the 8-bit representation of boolean values may be preferred.
> This can benefit interactions with existing libraries that avoid packing
> column data like 1-bit booleans for parallelization purposes, including GPU
> libraries such as libcudf. The original issue [1] identifies numpy
> conversion as a specific use-case as well.
>
> The details of the extension type can be found in the draft PR [2] which
> contains a Go implementation (WIP) and an update to the documentation for
> canonical extension types. I plan to add a C++ implementation as well but
> wanted to open this discussion first.
>
> A quick overview of the layout / semantics proposed in the PR:
> Storage Type: Int8
> Value Semantics: 0 == false, any non-zero value is true
>
> I'd appreciate any feedback here or on the PR. If this all seems reasonable
> then I'll move forward with the next implementation and open up another
> proposal for a formal vote. Thanks!
>
> [1]: https://github.com/apache/arrow/issues/17682
> [2]: https://github.com/apache/arrow/pull/43234
>

Reply via email to