alamb commented on code in PR #8376:
URL: https://github.com/apache/arrow-rs/pull/8376#discussion_r2365559123
##########
parquet/src/file/serialized_reader.rs:
##########
@@ -732,8 +737,12 @@ impl SerializedPageReaderContext {
_page_index: usize,
_dictionary_page: bool,
) -> Result<PageHeader> {
- let mut prot = TCompactInputProtocol::new(input);
- Ok(PageHeader::read_from_in_protocol(&mut prot)?)
+ let mut prot = ThriftReadInputProtocol::new(input);
Review Comment:
Something I was thinking about last night was "how would we implement only
decoding statistics / metadata for a subset of columns and/or Row Groups"
This PR plumbs the flag for reading page statistics down, but I wonder if it
would make sense to start collecting the decoder functions into a struct
```rust
pub struct ParquetThriftDecoder {
read_page_stats: bool,
// which columns to read detailed statistics for
read_column_statistics: Vec<bool>,
// ....
}
```
It seems like `SerializedPageReaderContext` is kind of fills this roll, but
it only applies to a subset of encoding 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]