Re: [I] Reduce Allocations When Reading Parquet Metadata [arrow-rs]

via GitHub Fri, 17 May 2024 01:52:57 -0700


marcin-krystianc commented on issue #5775:
URL: https://github.com/apache/arrow-rs/issues/5775#issuecomment-2117069394


   > > > What would really be cool is to be able to skip decoding entirely for 
metadata not needed for a particular operation (eg skip decoding of 
ColumnChunks that are not being projected by the reader)
   > > 
   > > 
   > > This would effectively achieve that by making the decoding step as 
inexpensive as skipping over data. The nature of variable length encodings 
means you have to scan through regardless
   > 
   > Yeah true, looking at the compact encoding again I guess there is no way 
to skip variable length structs/list elements without decoding them
   
   I looked at this problem few month ago, and my conclusion was that half of 
the time is spent in decoding the thrift data and the other half of the time is 
spent on the objects allocation and initialisation (this was in c++).  So 
without changing the metadata format and without using different encoding it is 
only possible to get like 2x improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Reduce Allocations When Reading Parquet Metadata [arrow-rs]

Reply via email to