alamb commented on code in PR #251:
URL: https://github.com/apache/parquet-format/pull/251#discussion_r1643208427
##########
LogicalTypes.md:
##########
@@ -554,8 +555,8 @@ The sort order used for `JSON` is unsigned byte-wise
comparison.
### BSON
-`BSON` is used for an embedded BSON document. It must annotate a `binary`
-primitive type. The `binary` data is interpreted as an encoded BSON document as
+`BSON` is used for an embedded BSON document. It must annotate a BYTE_ARRAY
+primitive type. The byte array data is interpreted as an encoded BSON document
as
Review Comment:
```suggestion
primitive type. The BYTE_ARRAY data is interpreted as an encoded BSON
document as
```
##########
LogicalTypes.md:
##########
@@ -228,7 +229,7 @@ integer. A precision too large for the underlying type (see
below) is an error.
warning
* `fixed_len_byte_array`: precision is limited by the array size. Length `n`
can store <= `floor(log_10(2^(8*n - 1) - 1))` base-10 digits
-* `binary`: `precision` is not limited, but is required. The minimum number of
+* `byte_array`: `precision` is not limited, but is required. The minimum
number of
Review Comment:
Perhaps we can change this to name the types using the same names as
elswhere:
```suggestion
* `FIXED_LEN_BYTE_ARRAY `: precision is limited by the array size. Length `n`
can store <= `floor(log_10(2^(8*n - 1) - 1))` base-10 digits
* `BYTE_ARRAY `: `precision` is not limited, but is required. The minimum
number of
##########
LogicalTypes.md:
##########
@@ -211,8 +212,8 @@ unsigned integers with 8, 16, 32, or 64 bit width.
`DECIMAL` annotation represents arbitrary-precision signed decimal numbers of
the form `unscaledValue * 10^(-scale)`.
-The primitive type stores an unscaled integer value. For byte arrays, binary
-and fixed, the unscaled number must be encoded as two's complement using
+The primitive type stores an unscaled integer value. For byte arrays, variable
+and fixed-length, the unscaled number must be encoded as two's complement using
Review Comment:
We can probably make this wording clearer by just using the names directly:
```suggestion
The primitive type stores an unscaled integer value. For BYTE_ARRAY and
FIXED_LEN_BYTE_ARRAY, the unscaled number must be encoded as two's
complement using
```
##########
LogicalTypes.md:
##########
@@ -544,8 +545,8 @@ Embedded types do not have type-specific orderings.
### JSON
-`JSON` is used for an embedded JSON document. It must annotate a `binary`
-primitive type. The `binary` data is interpreted as a UTF-8 encoded character
+`JSON` is used for an embedded JSON document. It must annotate a BYTE_ARRAY
+primitive type. The byte array data is interpreted as a UTF-8 encoded character
Review Comment:
```suggestion
primitive type. The BYTE_ARRAY data is interpreted as a UTF-8 encoded
character
```
##########
LogicalTypes.md:
##########
@@ -59,7 +60,7 @@ Compatibility considerations are mentioned for each
annotation in the correspond
### STRING
-`STRING` may only be used to annotate the binary primitive type and indicates
+`STRING` may only be used to annotate the BYTE_ARRAY primitive type and
indicates
Review Comment:
I think the existing wording is pretty clear of we take "binary primitive
type" to mean BYTE_ARRAY and thus in my opinion this is not a change to the
spec.
Perhaps we could send a note to [email protected] just highlighting
this clarification in case someone wants to chime in and say they read it to
mean FIXED_LENGTH_BYTE_ARRAY was also supported
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]