rok commented on code in PR #241:
URL: https://github.com/apache/parquet-format/pull/241#discussion_r1626834692


##########
LogicalTypes.md:
##########
@@ -255,6 +255,16 @@ The primitive type is a 2-byte fixed length binary.
 
 The sort order for `FLOAT16` is signed (with special handling of NANs and 
signed zeros); it uses the same 
[logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT` and 
`DOUBLE`.
 
+### FIXED_SIZE_LIST

Review Comment:
   > * This solves the problem of redundant R-Levels. Since it's just a 
primitive column, no r-level considerations have to be taken into account.
   
   This is the main reason I'd like to propose this type, see 
https://github.com/apache/arrow/issues/34510.
   
   > * Cannot create fixed size lists of nested types (e.g., list of structs). 
I see that this isn't necessary for tensors or embedding vectors, but shouldn't 
the feature be extensible for other scenarios as well? This limits the 
composability of the feature. I can now create a struct of fixed size lists, 
but not a fixed size list of structs.
   
   Lack of composability is a downside, but I think it's still worth the 
compromise. I've not seen need for fixed_size_list(struct) in tensor computing, 
but that's probably just because it's not available.
   
   > * Cannot have null elements in fixed size lists. This might not be desired 
for all lists, but there can be use cases where having null values in them is 
preferrable.
   
   In tensor computation this is usually addressed with bitmasks, which can be 
stored as a `fixed_size_list(binary, num_values)`.
   
   > * Parquet has a concept for (non-fixed size) lists. It is conceptually 
weird that fixed size lists are totally different from (non-fixed size) lists.
   
   Perhaps we should call this type `FixedSizeArray` to disambiguate?
   
   
   > I'm now feeling that maybe wrapping a `Vector[PrimitiveType, Size]` is 
also ok, but currently representing this is a bitweird in the model. May I ask 
would a `Vector` having data below?
   > 
   > ```
   > 1. [1, 1, 1], [null, 1, 1] <-- data with null
   > 2. null, [1, 1, 1] <-- null vector
   > ```
   > 
   > And would vector contains a "nested" vector?
   
   I think case 2. is ok, but case 1. should be expressed with a separate null 
bitmask that's not part of the type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to