[
https://issues.apache.org/jira/browse/ARROW-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373876#comment-17373876
]
Koyomi Akaguro commented on ARROW-13254:
----------------------------------------
[~westonpace] I try several times by cutting my data to different size. I find
that when I use only 1/4 data, the memory used double from 3+G to 6G and then
is done smoothly. However when I use half of the data which is 4.4G, the memory
used again jump from 6+G to 60G and then killed. It seems that if my data is
large to some limit, the pa.Table.from_pandas function would just explode the
memory to unlimited high.
> [Python] Processes killed and semaphore objects leaked when reading pandas
> data
> -------------------------------------------------------------------------------
>
> Key: ARROW-13254
> URL: https://issues.apache.org/jira/browse/ARROW-13254
> Project: Apache Arrow
> Issue Type: Bug
> Environment: OS name and version: macOS 11.4
> Python version: 3.8.10
> Pyarrow version: 4.0.1
> Reporter: Koyomi Akaguro
> Priority: Major
>
> When I run {{pa.Table.from_pandas(df)}} for a >1G dataframe, it reports
>
> {{Killed: 9
> ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216:
> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects
> to clean up at shutdown}}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)