[
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373906#comment-17373906
]
Weston Pace commented on ARROW-13254:
-------------------------------------
6x memory usage is not normal. This may be blowup from the dynamic allocator
or it could be a bug. In fact, it sounds a bit like ARROW-12983. You
mentioned you did not encounter this in the past. Do you get this error with
the exact same data on 3.0.0? If you do not get the error then ARROW-12983 is
most likely the culprit. Are you able to try with the latest nightly build
(https://arrow.apache.org/docs/python/install.html#installing-nightly-packages)?
> [Python] Processes killed and semaphore objects leaked when reading pandas
> data
> -------------------------------------------------------------------------------
>
> Key: ARROW-13254
> URL: https://issues.apache.org/jira/browse/ARROW-13254
> Project: Apache Arrow
> Issue Type: Bug
> Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
> Reporter: Koyomi Akaguro
> Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>
> {{Killed: 9
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216:
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects
> to clean up at shutdown}}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)