[GitHub] [arrow] Demetrio92 commented on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

GitBox Mon, 29 Mar 2021 14:54:14 -0700


Demetrio92 commented on issue #1688:
URL: https://github.com/apache/arrow/issues/1688#issuecomment-809739042



   Seems like the issue is back. But the guys are working on it. 
   
   https://issues.apache.org/jira/browse/ARROW-11157
   
   > As a workaround, you can read with pyarrow and do the conversion to pandas 
manually. So basically instead of `pd.parquet(..)` you can do 
`pyarrow.parquet.read_table(..).to_pandas(..)`.
   
   ------
   
   My 2c: last summer I had a dataset which wouldn't fit into RAM with 
categories stored as strings in pandas. This pretty much prevented me from 
writing and reading it as `parquet` using `fastparquet`. PyArrow literally 
saved the day by properly handling categories. 
   
   Would be really nice to see this working again. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Demetrio92 commented on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

Reply via email to