coastalwhite commented on PR #241:
URL: https://github.com/apache/parquet-format/pull/241#issuecomment-2425895697

   I like the general idea of moving `FixedSizeList` partially away from `List` 
and towards `FixedSizeBinary`, but I doubt it would lead to serious speedups or 
simplification.
   
   The `List` based deserializer most of the time already batches decoding 
similarly to what this would allow, although it would allow skipping many 
checks that happen before the actual deserialization takes place. We would also 
still need to support the old path for a long time, since a lot of people write 
parquet files using old versions of the parquet specification and generally use 
old parquet files.
   
   The one potentially large upside I can imagine of this is getting dictionary 
encoding for array's, but I am not sure how common that will be in real-world 
scenarios.
   
   In general, I would say I am in favor. Although, I am not 100% convinced yet 
that the added complexity will result in significant performance, file size or 
other benefits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to