Re: [PR] [thrift-remodel] Decoding of page indexes [arrow-rs]

via GitHub Mon, 18 Aug 2025 09:57:37 -0700


etseidl commented on PR #8160:
URL: https://github.com/apache/arrow-rs/pull/8160#issuecomment-3197722865


   > > 
   > 
   > What drives the need to convert to array of structs? Is that the 
representation of the ColumnIndex in Rust or is it something about how the 
thrift is encoded?
   
   Yes...parquet-rs takes the existing `ColumnIndex` which is a struct of 
arrays, each `num_pages` in length, and turns that into `num_pages` `PageIndex` 
objects contained in a `NativeIndex`, which is then encapsulated in an `Index` 
enum variant. While we're remodeling we could blow that up, but I think that 
would have a pretty big ripple effect downstream.
   
   > As you say, perhaps we could keep around a Bytes with the byte statistics 
in it, and store an offset there (rather than copying into their own structure).
   
   I'll try playing around with that and see if it helps.
   
   Also, I think this came up before, but only materializing the column index 
for columns being filtered on rather than for the entire schema would certainly 
help. Selectively writing them would be useful as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [thrift-remodel] Decoding of page indexes [arrow-rs]

Reply via email to