etseidl commented on PR #8160: URL: https://github.com/apache/arrow-rs/pull/8160#issuecomment-3197722865
> > > > What drives the need to convert to array of structs? Is that the representation of the ColumnIndex in Rust or is it something about how the thrift is encoded? Yes...parquet-rs takes the existing `ColumnIndex` which is a struct of arrays, each `num_pages` in length, and turns that into `num_pages` `PageIndex` objects contained in a `NativeIndex`, which is then encapsulated in an `Index` enum variant. While we're remodeling we could blow that up, but I think that would have a pretty big ripple effect downstream. > As you say, perhaps we could keep around a Bytes with the byte statistics in it, and store an offset there (rather than copying into their own structure). I'll try playing around with that and see if it helps. Also, I think this came up before, but only materializing the column index for columns being filtered on rather than for the entire schema would certainly help. Selectively writing them would be useful as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org