emkornfield commented on code in PR #481:
URL: https://github.com/apache/parquet-format/pull/481#discussion_r1918991446
##########
VariantEncoding.md:
##########
@@ -399,6 +399,7 @@ The Decimal type contains a scale, but no precision. The
implied precision of a
| Timestamp | timestamp with time zone | `22` |
TIMESTAMP(isAdjustedToUTC=true, NANOS) | 8-byte little-endian
|
| TimestampNTZ | timestamp without time zone | `23` |
TIMESTAMP(isAdjustedToUTC=false, NANOS) | 8-byte little-endian
|
| UUID | uuid | `24` | UUID
| 16-byte big-endian
|
+| Fixed(L) | Byte array of length L | `25` |
FIXED_LEN_BYTE_ARRAY[L] | 4 byte little-endian size L, followed by
length-L big-endian bytes |
Review Comment:
- I'm not sure endianness makes sense for fixed(L), endianess only applies
to multi-bytes structures? Fixed(L) each bytes is independent.
- I think the current proposal is reasonable and matches how things like
decimal with arbitrary precisions are encoded. It is also consistent with
string representation, if we are worried about overhead of 4 bytes then we
could use a variable width encoding schema (or have two types Short-fixed(L)
with 1 byte and fixed(L) with 4 buytes. Unfortunately, IIUC we can't have a
'short-fixed L' like we have for string because I think we are already use the
entire number range there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]