Ziheng Wang created ARROW-16037:
-----------------------------------

             Summary: Possible memory leak in compute.take
                 Key: ARROW-16037
                 URL: https://issues.apache.org/jira/browse/ARROW-16037
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 6.0.1
         Environment: Ubuntu
            Reporter: Ziheng Wang


If you run the following code, the memory usage of the process goes up to 1GB 
even though the pyarrow allocated bytes is always at ~80MB. The process memory 
comes down after a while to 800 MB, but is still way more than what is 
necessary.

'''

import pyarrow as pa
import numpy as np
import pandas as pd
import os, psutil
import pyarrow.compute as compute
import gc
my_table = 
pa.Table.from_pandas(pd.DataFrame(np.random.normal(size=(10000,1000))))

process = psutil.Process(os.getpid())
print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())

for i in range(100):
    print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
    temp = compute.sort_indices(my_table['0'], sort_keys = [('0','ascending')])
    my_table = my_table.take(temp)
    gc.collect()

'''



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to