[jira] [Commented] (ARROW-13254) [Python] Processes killed and semaphore objects leaked when reading pandas data

Weston Pace (Jira) Fri, 02 Jul 2021 14:45:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373801#comment-17373801
 ]


Weston Pace commented on ARROW-13254:
-------------------------------------

First you should figure out how large your dataframe is.  You could use 
df.memory_usage(deep=True) to get this information.

Second, you should determine how much memory you have available.  The linux 
command "free -h" can be used to get this information.

To convert from Pandas safely you will probably need around double the amount 
of memory required to store the dataframe.  If you do not have this much memory 
then you can convert the table in parts.

> [Python] Processes killed and semaphore objects leaked when reading pandas 
> data
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-13254
>                 URL: https://issues.apache.org/jira/browse/ARROW-13254
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
>            Reporter: Koyomi Akaguro
>            Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>  
>  {{Killed: 9 
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: 
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects 
> to clean up at shutdown}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13254) [Python] Processes killed and semaphore objects leaked when reading pandas data

Reply via email to