Roland Swingler created ARROW-13014:
---------------------------------------

             Summary: Pandas to_feather no longer works - runs out of memory
                 Key: ARROW-13014
                 URL: https://issues.apache.org/jira/browse/ARROW-13014
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 4.0.1, 4.0.0
         Environment: Linux
            Reporter: Roland Swingler


Since upgrading to 4.0.1 writing to feather files with the pandas to_feather 
method uses up far, far more memory.

For reference I have a dataframe that is around 10gb in size, 25 million rows. 
Writing a feather file took around 3-4gb of memory in pyarrow versions up to 
3.0.0. As of 4.0.1 I don't know how much memory it will take to successfully 
write - I tried running on a 120gb AWS machine, and that wasn't sufficient.

I can't provide the dataframe, but I can give an outline of the types / sizes 
of the columns:

size (bytes),type
206663144,int64
206663144,int64
206663144,float64
206663144,float64
2882448709,object
5813798687,object
206663144,float64
206663144,int64
206663144,int64
206663144,int64
206663144,int64
206663144,float64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to