AlenkaF commented on issue #47592:
URL: https://github.com/apache/arrow/issues/47592#issuecomment-3312003727
Here you are mixing a single-file metadata write (which also stores pandas
metadata) with a hive-partitioned read, and that is causing the issue. To make
the example work, both the write and the read need to use partitions:
```python
>>> df = pd.DataFrame({
... "0": [1, 2, 3],
... "1": [4, 5, 6],
... "run_date": ["2025-09-17","2025-09-17","2025-09-17"]
... })
... df.to_parquet("./test-pd-data", partition_cols=["run_date"])
...
>>>
>>> from pyarrow.dataset import dataset
... ds = dataset("test-pd-data", format="parquet", partitioning="hive")
... table = ds.to_table()
... print(table.schema)
...
0: int64
1: int64
run_date: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' +
620
>>> table.to_pandas()
...
0 1 run_date
0 1 4 2025-09-17
1 2 5 2025-09-17
2 3 6 2025-09-17
```
Just to confirm, are you only seeing issues on the read side? I’m a little
confused about the exact problem, so a clearer description and the error output
would really help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]