felipecrv commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1678471284
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
A specific UUID version is not required or guaranteed. This extension
represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not
interpret the bytes in any way.
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+ * **false** is denoted by the value ``0``.
+ * **true** can be specified using any non-zero value.
Review Comment:
It's the value of choice when casting booleans to integers on pretty much
any system that allows this cast.
In C++, for instance, multiplication with `true` and `false` is a common
pattern in branch-free programming.
```cpp
// not necessarily the best implementation of max, but it works.
// (it could overflow if a+b > MAX_INT, but the fact that C++
// compilers can assume + of signed ints doesn't overflows, allows
// it to detect this is just a max operation)
int branchfree_max(int a, int b) {
bool condition = a > b;
return a * condition + b * !condition;
}
```
This spec could also contain recommendations of what is expected when
explicitly casting int arrays to 8-bit boolean extension type. The
intX->8bit_bool cast SHOULD [1] canonicalize to 0 and 1 values.
C++ improved upon C's treatment of booleans by requiring that `bool` only
range between `0` and `1` even though implicit casts are allowed.
```cpp
bool cast(int a) {
return return a;
}
```
...becomes this in LLVM IR:
```llvm
%tobool = icmp ne i32 %a, 0
ret i1 %tobool
```
`icmp ne` means integer comparison (not equal) and `0` is the right-hand
size operand. So any `int` other than `0` becomes `true`, but `true` and
`false` tend to be `1` and `0` when a choice has to be made.
[1] SHOULD not MUST (https://www.rfc-editor.org/rfc/rfc2119). Users should
watch out for values different of 1 and act accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]