coastalwhite commented on PR #241: URL: https://github.com/apache/parquet-format/pull/241#issuecomment-2425895697
I like the general idea of moving `FixedSizeList` partially away from `List` and towards `FixedSizeBinary`, but I doubt it would lead to serious speedups or simplification. The `List` based deserializer most of the time already batches decoding similarly to what this would allow, although it would allow skipping many checks that happen before the actual deserialization takes place. We would also still need to support the old path for a long time, since a lot of people write parquet files using old versions of the parquet specification and generally use old parquet files. The one potentially large upside I can imagine of this is getting dictionary encoding for array's, but I am not sure how common that will be in real-world scenarios. In general, I would say I am in favor. Although, I am not 100% convinced yet that the added complexity will result in significant performance, file size or other benefits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
