[jira] [Commented] (ARROW-13254) [Python] Processes killed and semaphore objects leaked when reading pandas data

Koyomi Akaguro (Jira) Fri, 02 Jul 2021 15:33:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373811#comment-17373811
 ]


Koyomi Akaguro commented on ARROW-13254:
----------------------------------------

[~westonpace] If it needs double the amount of memory then yes it goes over the 
memory. Though weirdly I run the exactly same code several month ago and it 
goes well.

In terms of convert table in parts, do you mean split the dataframe and take 
each to pa.Table and then combine?

> [Python] Processes killed and semaphore objects leaked when reading pandas 
> data
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-13254
>                 URL: https://issues.apache.org/jira/browse/ARROW-13254
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
>            Reporter: Koyomi Akaguro
>            Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>  
>  {{Killed: 9 
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: 
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects 
> to clean up at shutdown}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13254) [Python] Processes killed and semaphore objects leaked when reading pandas data

Reply via email to