rok commented on PR #241:
URL: https://github.com/apache/parquet-format/pull/241#issuecomment-2148750273

   Apologies for taking a while to reply.
   
   I've split this into two cases: `FixedSizeListType` (length is constant) and 
`VariableSizeListType` (length differs per row) for the sake of discussion. I 
would move `VariableSizeListType` into a separate PR if we even decide it is 
needed next to `ListType`.
   
   > One thing to perhaps give thought to is how this might represent nested 
lists, say you wanted to encode a m by n matrix, would you just encode this as 
a `m * n` list or do we want to support this as a first-class concept?
   
   We could start with a more general multidimensional array definition and 
have list be a 1 dimensional array. Additional metadata required would not be 
that bad. I'm just a bit scared of validation and striding logic bleeding into 
parquet implementations. Do we have any other inputs / opinions?
   
   > I had perhaps been anticipating that fixed size list would be a variant of 
"REPEATED" as opposed to a physical type, that is just able to avoid 
incrementing the max_def_level and max_rep_level. This would make it 
significantly more flexible I think, although I concede it will make it harder 
to implement.
   
   That's interesting. What would you expect performance wise with this 
approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to