felipecrv commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1681129398


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
    A specific UUID version is not required or guaranteed. This extension 
represents
    UUIDs as FixedSizeBinary(16) with big-endian notation and does not 
interpret the bytes in any way.
 
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value 
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native 
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans 
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+  * **false** is denoted by the value ``0``.
+  * **true** can be specified using any non-zero value.

Review Comment:
   >  Would these optimizations work if 1 is preferred for true but any nonzero 
value is still considered valid?
   
   If the underlying memory for a `bool` value is not in {0, 1}, it's UB to 
multiply it.
   
   In summary:
   1) casting int to bool requires that the compiler normalize to `0` and `1`.
   2) when using a `bool`, it's usually not an issue for it to be out of the 
[0,1] range. `if (b)` will work. `b * my_int` will UB. It's up to the user.
   
   > Perhaps there are some "fastpath" optimizations that can be done by 
checking the first bit (LE) first to see if the value is 1
   
   Checking that the whole byte is non-zero is a very fast operation. Faster 
than checking a single bit.
   
   > Let me know if I'm understanding your suggestion correctly. [...] Unless 
I'm misunderstanding, if producers MUST produce 0 or 1 values then it's not 
clear how a consumer would ever receive any other value.
   
   I think I articulated very badly my suggestion here. Let me try again in the 
next comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to