Re: [I] [Python] Hive partition columns being forced to dict type [arrow]

via GitHub Fri, 19 Sep 2025 05:28:29 -0700


AlenkaF commented on issue #47592:
URL: https://github.com/apache/arrow/issues/47592#issuecomment-3312003727


   Here you are mixing a single-file metadata write (which also stores pandas 
metadata) with a hive-partitioned read, and that is causing the issue. To make 
the example work, both the write and the read need to use partitions:
   
   ```python
   >>> df = pd.DataFrame({
   ...     "0": [1, 2, 3],
   ...     "1": [4, 5, 6],
   ...     "run_date": ["2025-09-17","2025-09-17","2025-09-17"]
   ... })
   ... df.to_parquet("./test-pd-data", partition_cols=["run_date"])
   ... 
   >>> 
   >>> from pyarrow.dataset import dataset
   ... ds = dataset("test-pd-data", format="parquet", partitioning="hive")
   ... table = ds.to_table()
   ... print(table.schema)
   ... 
   0: int64
   1: int64
   run_date: string
   -- schema metadata --
   pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
620
   >>> table.to_pandas()
   ... 
      0  1    run_date
   0  1  4  2025-09-17
   1  2  5  2025-09-17
   2  3  6  2025-09-17
   ```
   
   Just to confirm, are you only seeing issues on the read side? I’m a little 
confused about the exact problem, so a clearer description and the error output 
would really help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Hive partition columns being forced to dict type [arrow]

Reply via email to