marcin-krystianc commented on issue #5775: URL: https://github.com/apache/arrow-rs/issues/5775#issuecomment-2117069394
> > > What would really be cool is to be able to skip decoding entirely for metadata not needed for a particular operation (eg skip decoding of ColumnChunks that are not being projected by the reader) > > > > > > This would effectively achieve that by making the decoding step as inexpensive as skipping over data. The nature of variable length encodings means you have to scan through regardless > > Yeah true, looking at the compact encoding again I guess there is no way to skip variable length structs/list elements without decoding them I looked at this problem few month ago, and my conclusion was that half of the time is spent in decoding the thrift data and the other half of the time is spent on the objects allocation and initialisation (this was in c++). So without changing the metadata format and without using different encoding it is only possible to get like 2x improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
