felipecrv commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1681161336


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
    A specific UUID version is not required or guaranteed. This extension 
represents
    UUIDs as FixedSizeBinary(16) with big-endian notation and does not 
interpret the bytes in any way.
 
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value 
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native 
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans 
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+  * **false** is denoted by the value ``0``.
+  * **true** can be specified using any non-zero value.

Review Comment:
   When producing a `Bool8` array, integer values of any width MUST be 
normalized into the `[0, 1]` range of `int8` values. `0` maps to `0` and every 
other non-zero value maps to `1` (this is a very simple CPU operation).
   
   A kernel consuming a `Bool8` array can assume values are in the `[0, 1]` 
range, but to be robust against less strictly-conformant producers, it's 
preferable to use `array[i] != 0` or `array[i] == 0` when converting the `int8` 
value to `bool` instead of comparing against `1`.
   
   Compute kernels that would benefit from the `[0, 1]` assumption can rely on 
it and document that non-strictly conforming `Bool8` input may lead to bogus 
output.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to