Liyixin95 opened a new issue, #5625:
URL: https://github.com/apache/arrow-rs/issues/5625
**Describe the bug**
<!--
A clear and concise description of what the bug is.
-->
As the title says, the `ParquetRecordBatchReader` can not recognize duration
type written by pandas or polars.
**To Reproduce**
<!--
Steps to reproduce the behavior:
-->
First, we should prepare parquet file
```python
import polars as pl
from datetime import timedelta
df = pl.DataFrame({
"a": [timedelta(days=1) for _ in range(100)]
})
df.write_parquet("./test.parquet")
```
Then, read in rust arrow-rs:
```rust
fn main() -> Result<()> {
// Create parquet file that will be read.
let path = "./test.parquet";
let file = File::open(path).unwrap();
let parquet_reader = ParquetRecordBatchReaderBuilder::try_new(file)?
.with_batch_size(8192)
.build()?;
let mut batches = Vec::new();
for batch in parquet_reader {
batches.push(batch?);
}
println!("{:#?}", batches[0].schema());
Ok(())
}
```
finally we get:
```
Schema {
fields: [
Field {
name: "a",
data_type: Int64,
nullable: true,
dict_id: 0,
dict_is_ordered: false,
metadata: {},
},
],
metadata: {},
}
```
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
polars result:
```
shape: (100, 1)
┌──────────────┐
│ a │
│ --- │
│ duration[μs] │
╞══════════════╡
│ 1d │
│ 1d │
│ 1d │
│ 1d │
│ 1d │
│ … │
│ 1d │
│ 1d │
│ 1d │
│ 1d │
│ 1d │
└──────────────┘
```
pandas result:
```
a
0 1 days
1 1 days
2 1 days
3 1 days
4 1 days
.. ...
95 1 days
96 1 days
97 1 days
98 1 days
99 1 days
[100 rows x 1 columns]
```
**Additional context**
<!--
Add any other context about the problem here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]