XiangpengHao commented on issue #5854:
URL: https://github.com/apache/arrow-rs/issues/5854#issuecomment-2154921008

   FWIW, by simply moving this field to heap (i.e., `Option<Statistics>` -> 
`Option<Box<Statistics>>`), we can get 30% performance improvement (as will 
show in blog #5770).
   
https://github.com/apache/arrow-rs/blob/087f34b70e97ee85e1a54b3c45c5ed814f500b0a/parquet/src/format.rs#L3407
   
   The `Option<Statistics>` occupies 136 bytes even if the file does not have 
stats at all (i.e., the field is `None`); this not only slows down decoding 
(due to poor memory locality) but also causes high memory consumption when 
decoding metadata (parquet-rs consumes 10MB memory per MB of metadata).
   
   
   I think this example motivates custom parquet type definitions and, thus, 
custom thrift decoder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to