[ 
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373876#comment-17373876
 ] 

Koyomi Akaguro commented on ARROW-13254:
----------------------------------------

[~westonpace]  I try several times by cutting my data to different size. I find 
that when I use only 1/4 data, the memory used double from 3+G to 6G and then 
is done smoothly. However when I use half of the data which is 4.4G, the memory 
used again jump from 6+G to 60G and then killed. It seems that if my data is 
large to some limit, the pa.Table.from_pandas function would just explode the 
memory to unlimited high.

> [Python] Processes killed and semaphore objects leaked when reading pandas 
> data
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-13254
>                 URL: https://issues.apache.org/jira/browse/ARROW-13254
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
>            Reporter: Koyomi Akaguro
>            Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>  
>  {{Killed: 9 
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: 
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects 
> to clean up at shutdown}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to