tfiasco opened a new issue #799:
URL: https://github.com/apache/arrow-rs/issues/799
**Describe the bug**
a parquet file created by `arrow-rs` has no min_max statistics when reading
by `pyarrow`.
**To Reproduce**
```rust
// rust code
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let id_array2 = Int32Array::from(vec![2, 3, 4, 5, 6]);
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("id2", DataType::Int32, false),
]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(id_array), Arc::new(id_array2)],
)
.unwrap();
let writer_properties = WriterProperties::builder()
.set_compression(Compression::ZSTD)
.set_statistics_enabled(true)
.build();
let path = "/.../test.parquet";
let file = fs::File::create(&path).unwrap();
let mut writer = ArrowWriter::try_new(file, schema.clone(),
Some(writer_properties)).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
let file2 = fs::File::open(&path).unwrap();
let file_reader = SerializedFileReader::new(file2).unwrap();
let mut arrow_reader = ParquetFileArrowReader::new(Arc::new(file_reader));
println!(
"statistics: {:?}",
arrow_reader
.get_metadata()
.row_group(0)
.column(0)
.statistics()
);
println!(
"statistics: {:?}",
arrow_reader
.get_metadata()
.row_group(0)
.column(1)
.statistics()
);
// output:
// statistics: Some(Int32({min: Some(1), max: Some(5), distinct_count: None,
null_count: 0, min_max_deprecated: false}))
// statistics: Some(Int32({min: Some(2), max: Some(6), distinct_count: None,
null_count: 0, min_max_deprecated: false}))
```
```python
# python code
import pyarrow.parquet as pq
f = pq.ParquetFile('./test.parquet')
print(f.metadata.row_group(0).column(0).statistics)
# output:
"""
<pyarrow._parquet.Statistics object at 0x7fbf8d409dd0>
has_min_max: False
min: None
max: None
null_count: 0
distinct_count: 0
num_values: 5
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
"""
```
**Expected behavior**
pyarrow should get statistics like
```
has_min_max: True
min: 1
max: 5
```
**Additional context**
rust lib version:
```
parquet = "5.4.0"
arrow = "5.4.0"
```
python lib version:
```
pyarrow==5.0.0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]