felipecrv commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1681161336
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
A specific UUID version is not required or guaranteed. This extension
represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not
interpret the bytes in any way.
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+ * **false** is denoted by the value ``0``.
+ * **true** can be specified using any non-zero value.
Review Comment:
When producing a `Bool8` array, integer values of any width MUST be
normalized into the `[0, 1]` range of `int8` values. `0` maps to `0` and every
other non-zero value maps to `1` (this is a very simple CPU operation).
A kernel consuming a `Bool8` array can assume values are in the `[0, 1]`
range, but to be robust against less strictly-conformant producers, it's
preferable to use `array[i] != 0` or `array[i] == 0` when converting the `int8`
value to `bool` instead of comparing against `1`.
Compute kernels that would benefit from the `[0, 1]` assumption can rely on
it and document that non-strictly conforming `Bool8` input may lead to bogus
output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]