aihuaxu commented on code in PR #481:
URL: https://github.com/apache/parquet-format/pull/481#discussion_r1918976484
##########
VariantEncoding.md:
##########
@@ -399,6 +399,7 @@ The Decimal type contains a scale, but no precision. The
implied precision of a
| Timestamp | timestamp with time zone | `22` |
TIMESTAMP(isAdjustedToUTC=true, NANOS) | 8-byte little-endian
|
| TimestampNTZ | timestamp without time zone | `23` |
TIMESTAMP(isAdjustedToUTC=false, NANOS) | 8-byte little-endian
|
| UUID | uuid | `24` | UUID
| 16-byte big-endian
|
+| Fixed(L) | Byte array of length L | `25` |
FIXED_LEN_BYTE_ARRAY[L] | 4 byte little-endian size L, followed by
length-L big-endian bytes |
Review Comment:
big-endian bytes: this is to keep in sync with the others like UUID which is
a fixed(16). And it makes sense to write a bytes in big endian since the
engine can write the bytes in the buffer in order, not requiring buffering the
whole string.
The required size: I initially avoided adding the fixed(L) type because I
believed we couldn't support fixed(L) if we try to include L in the type
description, as there wouldn't be enough bits available to represent the
length, given that only 5 bits are allocated for the type.
The way here to add the fixed(L) type is to add the length in the value
field - we are duplicating the length for each value but I don't see other
ways.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]