opensourcegeek commented on issue #4328:
URL: https://github.com/apache/arrow-rs/issues/4328#issuecomment-2100542181
That makes sense - thanks @alamb
```rust
pub fn parquet_stats_to_arrow(
arrow_datatype: &DataType,
statistics: impl IntoIterator<Item = Option<&Statistics>>
) -> Result<ArrowStatisics> {
todo!()
}
```
To implement the above function, I'm just trying to suss out the details
now. Below are the questions (probably very basic - apologies) using your `a =
5` example,
- `arrow_datatype`, this will be `a`s arrow data type => Int64 or the likes?
- `impl IntoIterator<Item = Option<&Statistics>>`, will this be Parquet
Statistics of all columns in 'current' row group? So I'd have to fish out `a`?
Not sure if I've interpreted correctly, to be able to fish statistics out for
`a` I'd need to know I'm fishing out for `a`. So I'm wondering if it is already
Parquet Statistics for `a` only, if that's the case why it's `impl
IntoIterator` and not just `Option<&Statistics>`?
- `Result<ArrowStatistics>`, once I get a handle on `a`'s Parquet
[statistic](https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html),
I think I'd need to convert each of the
[ValueStatistic](https://docs.rs/parquet/latest/parquet/file/statistics/struct.ValueStatistics.html)
to [ArrayRef](https://docs.rs/arrow/latest/arrow/array/type.ArrayRef.html)
based on `a`'s type? I couldn't find `row_count()` in `ValueStatistics` though.
Sorry, just trying to get an understanding of all the moving parts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]