iemejia commented on code in PR #576:
URL: https://github.com/apache/parquet-format/pull/576#discussion_r3345397490
##########
Encodings.md:
##########
@@ -68,7 +69,7 @@ stores the data in the following format:
For native types, this outputs the data as little endian. Floating
point types are encoded in IEEE.
-For the byte array type, it encodes the length as a 4 byte little
+For the byte array type, it encodes the length as a 4-byte little
endian, followed by the bytes.
Review Comment:
Done.
##########
Encodings.md:
##########
@@ -82,7 +83,7 @@ written first, before the data pages of the column chunk.
Dictionary page format: the entries in the dictionary using the
[plain](#PLAIN) encoding.
Data page format: the bit width used to encode the entry ids stored as 1 byte
(max bit width = 32),
-followed by the values encoded using RLE/Bit packed described above (with the
given bit width).
+followed by the values encoded using RLE/Bit-Packed described above (with the
given bit width).
Review Comment:
Done.
##########
Encodings.md:
##########
@@ -322,8 +323,6 @@ The delta encoding algorithm described above stores a bit
width per miniblock an
Supported Types: BYTE_ARRAY
-This encoding is always preferred over PLAIN for byte array columns.
-
Review Comment:
Reverted. You are right, this is a spec change and should be discussed
separately.
##########
LogicalTypes.md:
##########
@@ -219,15 +219,15 @@ scale stores the number of digits of that value that are
to the right of the
decimal point, and the precision stores the maximum number of digits supported
in the unscaled value.
-If not specified, the scale is 0. Scale must be zero or a positive integer less
-than or equal to the precision. Precision is required and must be a non-zero
positive
+If not specified, the scale is 0. Scale must be a non-negative integer less
+than or equal to the precision. Precision is required and must be a positive
Review Comment:
Reverted. Agreed, the original wording is clearer.
##########
LogicalTypes.md:
##########
@@ -243,7 +243,7 @@ comparison.
*Compatibility*
-To support compatibility with older readers, implementations of parquet-format
should
+To support compatibility with older readers, implementations of parquet-format
must
Review Comment:
Reverted. You are right, this is a spec change.
##########
LogicalTypes.md:
##########
@@ -544,7 +545,8 @@ are found during reading, they must be ignored.
## Embedded Types
-Embedded types do not have type-specific orderings.
+Embedded types do not have type-specific orderings beyond the unsigned
+byte-wise comparison of their physical type (`BYTE_ARRAY`).
Review Comment:
Done. Applied your suggestion.
##########
LogicalTypes.md:
##########
@@ -606,7 +608,7 @@ optional group variant_shredded (VARIANT(1)) {
### GEOMETRY
`GEOMETRY` is used for geospatial features in the Well-Known Binary (WKB)
format
-with linear/planar edges interpolation. It must annotate a `BYTE_ARRAY`
Review Comment:
Reverted to `edges`.
##########
README.md:
##########
@@ -172,7 +172,7 @@ be computed from the schema (i.e. how much nesting there
is). This defines the
maximum number of bits required to store the levels (levels are defined for all
values in the column).
-Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now
used as it supersedes BIT_PACKED.
+Two encodings for the levels are supported: `BIT_PACKED` and `RLE`. Only `RLE`
is now used as it supersedes `BIT_PACKED`.
Review Comment:
Done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]