paleolimbot commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1681032624
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
A specific UUID version is not required or guaranteed. This extension
represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not
interpret the bytes in any way.
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+ * **false** is denoted by the value ``0``.
+ * **true** can be specified using any non-zero value.
Review Comment:
I believe bitpacking is another place where it's helpful to have only 0s and
1s (so that you can blindly bitshift the bytes).
> Producers SHOULD produce 0 or 1 values. Consumers MUST treat any non-zero
value as true and 0 as false.
I like this language! The point of this extension (I think) is to unlock
zero-copy for producers of systems that already use int8s for booleans. Unless
the MUST (0 or 1) is ubiquitous in existing implementations, we would force a
loop along the data on export (to check) or a copy (if any values were not zero
or one). (I think we're all on the same page about this!)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]