[jira] [Comment Edited] (ARROW-16037) Possible memory leak in compute.take

Will Jones (Jira) Tue, 29 Mar 2022 13:49:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514324#comment-17514324
 ]


Will Jones edited comment on ARROW-16037 at 3/29/22, 8:48 PM:
--------------------------------------------------------------

If that doesn't work, then that suggests you are actually running PyArrow < 
3.0.0. I recomend double checking with {{pa.\_\_version\_\_}}. See: 
https://arrow.apache.org/docs/3.0/python/generated/pyarrow.MemoryPool.html


was (Author: willjones127):
If that doesn't work, then that suggests you are actually running PyArrow < 
3.0.0. I recomend double checking with {{{}pa.__version__{}}}. See: 
https://arrow.apache.org/docs/3.0/python/generated/pyarrow.MemoryPool.html

> Possible memory leak in compute.take
> ------------------------------------
>
>                 Key: ARROW-16037
>                 URL: https://issues.apache.org/jira/browse/ARROW-16037
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 6.0.1
>         Environment: Ubuntu
>            Reporter: Ziheng Wang
>            Priority: Blocker
>
> If you run the following code, the memory usage of the process goes up to 1GB 
> even though the pyarrow allocated bytes is always at ~80MB. The process 
> memory comes down after a while to 800 MB, but is still way more than what is 
> necessary.
> '''
> import pyarrow as pa
> import numpy as np
> import pandas as pd
> import os, psutil
> import pyarrow.compute as compute
> import gc
> my_table = 
> pa.Table.from_pandas(pd.DataFrame(np.random.normal(size=(10000,1000))))
> process = psutil.Process(os.getpid())
> print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
> for i in range(100):
>     print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
>     temp = compute.sort_indices(my_table['0'], sort_keys = 
> [('0','ascending')])
>     my_table = my_table.take(temp)
>     gc.collect()
> '''



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-16037) Possible memory leak in compute.take

Reply via email to