[GitHub] [arrow] Demetrio92 edited a comment on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

GitBox Thu, 18 Jun 2020 06:46:23 -0700


Demetrio92 edited a comment on issue #1688:
URL: https://github.com/apache/arrow/issues/1688#issuecomment-646026392



   Stumbled across this bug again. 
   `pyarrow` preserves `category` as `dtype`, `fastparquet` **does not**. 
   
   Docs don't mention it. They even kinda mislead the users:
   https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-parquet
   
![image](https://user-images.githubusercontent.com/22682408/85027440-39149c80-b17a-11ea-9258-cfa69c1d5b3d.png)
   ^ this works, because the second option has `engine='pyarrow'`!!!
   So, the default `auto`, which uses `fastparquet` would actually result in 
those being `object`. 
   
   _This took me a bit, before I noticed where and why do I have memory leaks 
despite optimizing my dataframes..._
   
   ------
   
   ```python
   pd.__version__
   Out[3]: '1.0.4'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Demetrio92 edited a comment on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

Reply via email to