zeroshade commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1681654835


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
    A specific UUID version is not required or guaranteed. This extension 
represents
    UUIDs as FixedSizeBinary(16) with big-endian notation and does not 
interpret the bytes in any way.
 
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value 
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native 
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans 
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+  * **false** is denoted by the value ``0``.
+  * **true** can be specified using any non-zero value.

Review Comment:
   my personal preference here is the wording that producers `SHOULD` produce 0 
or 1 values, not `MUST` simply because there *might* be a system that uses 
non-0/1 values. Consumers shouldn't assume 0 and 1, they should only assume 0 
and non-zero.
   
   In practice, this will almost always be 0 and 1, but I'm generally in favor 
of fewer restrictions on an interoperable protocol where possible given that 
our stated goal is zero-copy. So supporting more systems is favorable. Since 
consumers can generally perform any and all operations based on just comparing 
against 0, I don't think we gain anything specifically by requiring producers 
to provide values in `[0, 1]`. 
   
   While low-level C performance can be gained by multiplying with `true` and 
`false` for branch free programming, in the context of a compute kernel I think 
we shouldn't rely on that behavior. Most compute systems disallow multiplying 
bools anyways. Allowing zero-copy casts between Bool8 and Int8 without 
requiring a pass to check values and convert them is a good benefit I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to