Weston Pace created ARROW-15017:
-----------------------------------

             Summary: [Python][C++] pyarrow.ipc.RecordBatchFileReader holding 
onto memory after being disposed
                 Key: ARROW-15017
                 URL: https://issues.apache.org/jira/browse/ARROW-15017
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
            Reporter: Weston Pace


I'll attach a full reproduction but the important bit is here:

{noformat}
  with ipc.RecordBatchFileReader(path) as reader:
    table = reader.read_all()
  # If you comment out this next line then memory usage will be worse           
                                                                                
                                                   
  del reader
  df = table.to_pandas()
  del table
  gc.collect()
{noformat}

The input file is ~3GB.  This uses peak ~6GB because of conversion to pandas 
and over time that excess 3GB will be returned by jemalloc.

However, if I do not run "del reader" then it uses peak ~9GB and only shrinks 
down to 6GB even after a 5 second wait.  Since the reader is disposed this was 
rather surprising to me.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to