[ 
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373837#comment-17373837
 ] 

Weston Pace commented on ARROW-13254:
-------------------------------------

There are a few reasons it may have worked previously.  If the data or data 
type changed then the amount of memory used in either representation may have 
changed.  It's possible your OS was previously allowing swap and that was 
allowing you to run over the amount of physical memory on the device.  It's 
also possible the amount of available memory on the server has changed because 
other processes are running that were not running previously.

 

> In terms of convert table in parts, do you mean split the dataframe and take 
> each to pa.Table and then combine?

Yes, but you will need to make sure to delete the old parts of the dataframe as 
they are no longer needed.  For example...

 
{code:java}
df_1 = df.iloc[:1000000,:]
df_2 = df.iloc[1000001:,:]
del df
table_1 = pa.Table.from_pandas(df_1)
del df_1
table_2 = pa.Table.from_pandas(df_2)
del df_2
{code}

> [Python] Processes killed and semaphore objects leaked when reading pandas 
> data
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-13254
>                 URL: https://issues.apache.org/jira/browse/ARROW-13254
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
>            Reporter: Koyomi Akaguro
>            Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>  
>  {{Killed: 9 
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: 
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects 
> to clean up at shutdown}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to