jhorstmann commented on code in PR #241:
URL: https://github.com/apache/parquet-format/pull/241#discussion_r2877924765


##########
src/main/thrift/parquet.thrift:
##########
@@ -319,6 +319,11 @@ struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BYTE_ARRAY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
 struct Float16Type {} // allowed for FIXED[2], must be encoded as raw FLOAT16 
bytes (see LogicalTypes.md)
+struct FixedSizeListType {        // allowed for FIXED_LEN_BYTE_ARRAY; see 
LogicalTypes.md
+    1: required Type type;        // element type (fixed-width primitive; must 
not be BOOLEAN, INT96, or BYTE_ARRAY)

Review Comment:
   Adding a logical type could work, and it would then even support nested 
lists or matrices. It's not immediately obvious, but `Type` could not support 
that since the length of `FIXED_LEN_BYTE_ARRAY` is stored in `SchemaElement`.
   
   What I don't like is that here, the logical type is used to influence the 
physical layout, where as elsewhere, a PLAIN encoded INT32 with logical type 
INT_8 would still be stored using 4 bytes.
   
   Hm, thinking out loud a bit, the physical width is already defined by 
`type_length of FIXED_LEN_BYTE_ARRAY / num_values`. The logical type should 
then be enough to interpret these bytes, without the `Type` field. The only 
blocker for that is that there is no logical type annotation to indicate 
`FLOAT` or `DOUBLE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to