etseidl commented on code in PR #576:
URL: https://github.com/apache/parquet-format/pull/576#discussion_r3344273125


##########
Encodings.md:
##########
@@ -82,7 +83,7 @@ written first, before the data pages of the column chunk.
 Dictionary page format: the entries in the dictionary using the 
[plain](#PLAIN) encoding.
 
 Data page format: the bit width used to encode the entry ids stored as 1 byte 
(max bit width = 32),
-followed by the values encoded using RLE/Bit packed described above (with the 
given bit width).
+followed by the values encoded using RLE/Bit-Packed described above (with the 
given bit width).

Review Comment:
   ```suggestion
   followed by the values encoded using the RLE/Bit-Packing described above 
(with the given bit width).
   ```
   



##########
Encodings.md:
##########
@@ -68,7 +69,7 @@ stores the data in the following format:
 For native types, this outputs the data as little endian. Floating
     point types are encoded in IEEE.
 
-For the byte array type, it encodes the length as a 4 byte little
+For the byte array type, it encodes the length as a 4-byte little
 endian, followed by the bytes.

Review Comment:
   ```suggestion
   endian integer, followed by the bytes.
   ```



##########
LogicalTypes.md:
##########
@@ -544,7 +545,8 @@ are found during reading, they must be ignored.
 
 ## Embedded Types
 
-Embedded types do not have type-specific orderings.
+Embedded types do not have type-specific orderings beyond the unsigned
+byte-wise comparison of their physical type (`BYTE_ARRAY`).

Review Comment:
   This isn't correct since `VARIANT`, `GEOMETRY`, and `GEOGRAPHY` have 
undefined orderings. Maybe instead
   ```suggestion
   Embedded types do not have type-specific orderings unless otherwise 
specified.
   ```



##########
LogicalTypes.md:
##########
@@ -606,7 +608,7 @@ optional group variant_shredded (VARIANT(1)) {
 ### GEOMETRY
 
 `GEOMETRY` is used for geospatial features in the Well-Known Binary (WKB) 
format
-with linear/planar edges interpolation. It must annotate a `BYTE_ARRAY`

Review Comment:
   As with the other PR, I think "edges" is deliberate



##########
Encodings.md:
##########
@@ -322,8 +323,6 @@ The delta encoding algorithm described above stores a bit 
width per miniblock an
 
 Supported Types: BYTE_ARRAY
 
-This encoding is always preferred over PLAIN for byte array columns.
-

Review Comment:
   This isn't a typo, you're actually changing the spec here. I think this 
needs an actual discussion.



##########
LogicalTypes.md:
##########
@@ -243,7 +243,7 @@ comparison.
 
 *Compatibility*
 
-To support compatibility with older readers, implementations of parquet-format 
should
+To support compatibility with older readers, implementations of parquet-format 
must

Review Comment:
   Another spec change. 



##########
Compression.md:
##########
@@ -89,7 +89,7 @@ switch to the newer, interoperable `LZ4_RAW` codec.
 ### ZSTD
 
 A codec based on the Zstandard format defined by
-[RFC 8478](https://tools.ietf.org/html/rfc8478).  If any ambiguity arises
+[RFC 8878](https://tools.ietf.org/html/rfc8878).  If any ambiguity arises

Review Comment:
   Nice catch!



##########
LogicalTypes.md:
##########
@@ -219,15 +219,15 @@ scale stores the number of digits of that value that are 
to the right of the
 decimal point, and the precision stores the maximum number of digits supported
 in the unscaled value.
 
-If not specified, the scale is 0. Scale must be zero or a positive integer less
-than or equal to the precision. Precision is required and must be a non-zero 
positive
+If not specified, the scale is 0. Scale must be a non-negative integer less
+than or equal to the precision. Precision is required and must be a positive

Review Comment:
   I think this change is less clear, even if correct.



##########
README.md:
##########
@@ -172,7 +172,7 @@ be computed from the schema (i.e. how much nesting there 
is).  This defines the
 maximum number of bits required to store the levels (levels are defined for all
 values in the column).
 
-Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now 
used as it supersedes BIT_PACKED.
+Two encodings for the levels are supported: `BIT_PACKED` and `RLE`. Only `RLE` 
is now used as it supersedes `BIT_PACKED`.

Review Comment:
   ```suggestion
   Two encodings for the levels are supported: `BIT_PACKED` and `RLE`. Only 
`RLE` is currently used as it supersedes `BIT_PACKED`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to