felipecrv commented on code in PR #43234:
URL: https://github.com/apache/arrow/pull/43234#discussion_r1683301515
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,28 @@ UUID
A specific UUID version is not required or guaranteed. This extension
represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not
interpret the bytes in any way.
+8-bit Boolean
+====
+
+Bool8 represents a boolean value using 1 byte (8 bits) to store each value
instead of only 1 bit as in
+the native Arrow Boolean type. Although less compact that the native
representation, Bool8 may have
+better zero-copy compatibility with various systems that also store booleans
using 1 byte.
+
+* Extension name: ``arrow.bool8``.
+
+* The storage type of this extension is ``Int8`` where:
+
+ * **false** is denoted by the value ``0``.
+ * **true** can be specified using any non-zero value.
Review Comment:
>> You don't have to say MUST in the spec. Just say that 1 is the value of
choice when you're forced to pick a non-zero value to represent true.
>
> Isn't this equivalent to just saying SHOULD? If you agree we don't have to
use MUST, then I think we're pretty much saying the same thing. The important
distinction though, is that if we don't say that producers MUST use [0, 1],
then consumers cannot and should not assume [0, 1]. Make sense?
I'm saying to not use MUST/SHOULD language anymore because people do not
necessarily have RFC2119 in mind when they read specs, but my intent, in
non-RFC speak remains the same:
- producers strive to produce only [0,1]-ranged bool8 values and if they
don't, they are non-strictly conformant to the spec
- consumers only assume [0,1] range if they understand that non-strictly
compliant producers (a possibility in an imperfect world) might break their
code. This is what SHOULD means in RFC2119.
The spec is a strict ideal. The implementations approximate the spec.
```
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
```
> I think at this point it makes more sense for us to summarize this and put
it onto the mailing list and see if we can get more input from other people. I
agree that this shouldn't hold up the spec though.
I thought this would be a non-controversial proposal:
```suggestion
* **true** can be specified using any non-zero value. Preferably ``1``.
```
I don't feel like debating it too much anymore given that people naturally
gravitate towards producing [0, 1]-ranged byte-sized bools no matter what the
spec says.
It might be impossible to get consensus on a text that reconciles
recommending the use of `1` while also saying that any non-zero value can be
interpreted as `true`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]