tustvold commented on issue #5270:
URL: https://github.com/apache/arrow-rs/issues/5270#issuecomment-1873948824
Ok so the statistics are present in the file
```
#[test]
fn test_example() {
let file =
File::open("/home/raphael/Downloads/no_stats.parquet").unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let metadata = reader.metadata().row_group(0).column(0);
let col_stats = metadata.statistics().unwrap();
println!("{col_stats:?}");
}
```
```
ByteArray({min: Some(ByteArray { data: "01" }), max: Some(ByteArray { data:
"01" }), distinct_count: None, null_count: 0, min_max_deprecated: false,
min_max_backwards_compatible: false, max_value_exact: false, min_value_exact:
false})
```
So the question is now why pyarrow is unhappy with those statistics, I
vaguely remember some bug/limitation in pyarrow related to this - let me see if
I can dig it out.
> is much slower than on files created with PyArrow
This could be for a very large number of reasons
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]