Demetrio92 edited a comment on issue #1688:
URL: https://github.com/apache/arrow/issues/1688#issuecomment-646026392


   Stumbled across this bug again. 
   `pyarrow` preserves `category` as `dtype`, `fastparquet` **does not**. 
   
   Docs don't mention it. They even kinda mislead the users:
   https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-parquet
   
![image](https://user-images.githubusercontent.com/22682408/85027440-39149c80-b17a-11ea-9258-cfa69c1d5b3d.png)
   ^ this works, because the second option has `engine='pyarrow'`!!!
   So, the default `auto`, which uses `fastparquet` would actually result in 
those being `object`. 
   
   _This took me a bit, before I noticed where and why do I have memory leaks 
despite optimizing my dataframes..._
   
   ------
   
   ```python
   pd.__version__
   Out[3]: '1.0.4'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to