etseidl opened a new pull request, #8191: URL: https://github.com/apache/arrow-rs/pull/8191
# Which issue does this PR close? **Note: this targets a feature branch, not main** - Part of #5854. # Rationale for this change Parsing the column index is _very_ slow. The largest part of the cost is taking the thrift structure (which is a struct of arrays) and converting it to an array of structs. This results in a large number of allocations when dealing with binary columns. This is an experiment in creating a new structure to hold the column index info that is a little friendlier to parse. It may also be easier to consume on the datafusion side. # What changes are included in this PR? A new `ColumnIndexMetaData` enum is added along with a type parameterized `NativeColumnIndex` struct. # Are these changes tested? No, this is an experiment only. If this work can be honed into an acceptible `Index` replacement, then tests will be added at that time. # Are there any user-facing changes? Yes, this would be a radical change to the column indexes in `ParquetMetaData`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org