alamb opened a new pull request, #8340: URL: https://github.com/apache/arrow-rs/pull/8340
# Which issue does this PR close? - part of #8000 - Follow on to https://github.com/apache/arrow-rs/pull/8080 # Rationale for this change The current ParquetMetadataDecoder intermixes three things: 1. The state machine for decoding parquet metadata (footer, then metadata, then (optional) indexes) 2. orchestrating IO (aka calling read, etc) 3. Decoding thrift encoded byte into objets This makes it almost impossible to add features like "only decode a subset of the columns in the ColumnIndex" and other potentially advanced usecases Now that we have a "push" style API for metadata decoding that avoids IO, the next step is to extract out the actual work into this API so that the existing ParquetMetadataDecoder just calls into the PushDecoder # What changes are included in this PR? 1. Extract decoding state machine into PushMetadataDecoder 2. Update ParquetMetadataDecoder to use the PushMetadataDecoder 3. Extract the bytes --> object code into its own module This almost certainly will conflict with @etseidl 's plans in thrift-remodel. # Are these changes tested? by existing tests # Are there any user-facing changes? Not really -- this is an internal change that will make it easier to add features like "only decode a subset of the columns in the ColumnIndex, for example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org