[ 
https://issues.apache.org/jira/browse/ARROW-17441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581015#comment-17581015
 ] 

Weston Pace commented on ARROW-17441:
-------------------------------------

ReleaseUnused is also only going to apply to pool-allocated memory.  It appears 
here we are NOT dealing with pool-allocated memory since "Total Allocated 
Bytes" is 0.  So we shouldn't focus too much on mimalloc.  I suspect the 
culprit is plain old malloc.  However, a change in behavior in mimalloc *could* 
cause a change in behavior in malloc.  When reading parquet the pattern is 
roughly:

Lots of small flatbuffers allocations
Large buffer allocations
Lots of small flatbuffers allocations
Large buffer allocations

So if the "Large buffer allocations" changed in behavior it's possible for the 
interleaved non-buffer allocations to end up with different fragmentation.  
Although this seems unlikely.  Maybe there are other things that changed 
between 8 and 9 like clang / glibc version / etc?

This theory is maybe reinforced by the fact that this only happens on Mac (is 
this correct?  I can't get this to reproduce on Ubuntu.  Maybe it is clang 
specific?).  Since mimalloc is probably fairly similar between mac and linux 
but the system allocator might not be so similar.  Debugging (and fixing) 
fragmentation is going to be pretty tricky.

First, we should prove this is indeed a fragmentation issue.  I think it would 
be very helpful for our memory investigation efforts to have some kind of way 
to visualize or calculate the amount of fragmentation.  Sadly, I tried some 
brief googling and it looks like that might be a tricky task.  Maybe the 
simplest thing to do is replace the global allocator with jemalloc which has 
APIs that can dump a lot more detailed statistics.  However, doing this might 
itself fix the fragmentation since you're no longer using the standard 
allocator.


> [Python] Memory kept after del and pool.released_unused()
> ---------------------------------------------------------
>
>                 Key: ARROW-17441
>                 URL: https://issues.apache.org/jira/browse/ARROW-17441
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 9.0.0
>            Reporter: Will Jones
>            Priority: Major
>
> I was trying reproduce another issue involving memory pools not releasing 
> memory, but encountered this confusing behavior: if I create a table, then 
> call {{{}del table{}}}, and then {{{}pool.release_unused(){}}}, I still see 
> significant memory usage. On mimalloc in particular, I see no meaningful drop 
> in memory usage on either call.
> Am I missing something? My understanding prior has been that memory will be 
> held onto by a memory pool, but will be forced free by release_unused; and 
> that system memory pool should release memory immediately. But neither of 
> those seem true.
> {code:python}
> import os
> import psutil
> import time
> import gc
> process = psutil.Process(os.getpid())
> import numpy as np
> from uuid import uuid4
> import pyarrow as pa
> def gen_batches(n_groups=200, rows_per_group=200_000):
>     for _ in range(n_groups):
>         id_val = uuid4().bytes
>         yield pa.table({
>             "x": np.random.random(rows_per_group), # This will compress poorly
>             "y": np.random.random(rows_per_group),
>             "a": pa.array(list(range(rows_per_group)), type=pa.int32()), # 
> This compresses with delta encoding
>             "id": pa.array([id_val] * rows_per_group), # This compresses with 
> RLE
>         })
> def print_rss():
>     print(f"RSS: {process.memory_info().rss:,} bytes")
> print(f"memory_pool={pa.default_memory_pool().backend_name}")
> print_rss()
> print("reading table")
> tab = pa.concat_tables(list(gen_batches()))
> print_rss()
> print("deleting table")
> del tab
> gc.collect()
> print_rss()
> print("releasing unused memory")
> pa.default_memory_pool().release_unused()
> print_rss()
> print("waiting 10 seconds")
> time.sleep(10)
> print_rss()
> {code}
> {code:none}
> > ARROW_DEFAULT_MEMORY_POOL=mimalloc python test_pool.py && \
>     ARROW_DEFAULT_MEMORY_POOL=jemalloc python test_pool.py && \
>     ARROW_DEFAULT_MEMORY_POOL=system python test_pool.py
> memory_pool=mimalloc
> RSS: 44,449,792 bytes
> reading table
> RSS: 1,819,557,888 bytes
> deleting table
> RSS: 1,819,590,656 bytes
> releasing unused memory
> RSS: 1,819,852,800 bytes
> waiting 10 seconds
> RSS: 1,819,852,800 bytes
> memory_pool=jemalloc
> RSS: 45,629,440 bytes
> reading table
> RSS: 1,668,677,632 bytes
> deleting table
> RSS: 698,400,768 bytes
> releasing unused memory
> RSS: 699,023,360 bytes
> waiting 10 seconds
> RSS: 699,023,360 bytes
> memory_pool=system
> RSS: 44,875,776 bytes
> reading table
> RSS: 1,713,569,792 bytes
> deleting table
> RSS: 540,311,552 bytes
> releasing unused memory
> RSS: 540,311,552 bytes
> waiting 10 seconds
> RSS: 540,311,552 bytes
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to