jhorstmann commented on code in PR #241:
URL: https://github.com/apache/parquet-format/pull/241#discussion_r2877924765
##########
src/main/thrift/parquet.thrift:
##########
@@ -319,6 +319,11 @@ struct ListType {} // see LogicalTypes.md
struct EnumType {} // allowed for BYTE_ARRAY, must be encoded with UTF-8
struct DateType {} // allowed for INT32
struct Float16Type {} // allowed for FIXED[2], must be encoded as raw FLOAT16
bytes (see LogicalTypes.md)
+struct FixedSizeListType { // allowed for FIXED_LEN_BYTE_ARRAY; see
LogicalTypes.md
+ 1: required Type type; // element type (fixed-width primitive; must
not be BOOLEAN, INT96, or BYTE_ARRAY)
Review Comment:
Adding a logical type could work, and it would then even support nested
lists or matrices. It's not immediately obvious, but `Type` could not support
that since the length of `FIXED_LEN_BYTE_ARRAY` is stored in `SchemaElement`.
What I don't like is that here, the logical type is used to influence the
physical layout, where as elsewhere, a PLAIN encoded INT32 with logical type
INT_8 would still be stored using 4 bytes.
Hm, thinking out loud a bit, the physical width is already defined by
`type_length of FIXED_LEN_BYTE_ARRAY / num_values`. The logical type should
then be enough to interpret these bytes, without the `Type` field. The only
blocker for that is that there is no logical type annotation to indicate
`FLOAT` or `DOUBLE`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]