Re: [I] [Python][Parquet] 21.0.0 release introduced memory leak when reading parquet [arrow]

via GitHub Wed, 20 Aug 2025 11:28:53 -0700


lucasKoeb commented on issue #47266:
URL: https://github.com/apache/arrow/issues/47266#issuecomment-3207576731


   Experienced similar issues after updating to 21.0.0 and had to revert back 
to 20.0.0 for now.
   
   Could reproduce the issue using the provided sample script on python 3.12.10 
on linux and windows.
   Tried some variations on the reproduction script:
   
   #### Using the ARROW_DEFAULT_MEMORY_POOL environment variable
   
   Using the environment variable to change the default to `system` or 
`jemalloc` on linux, and also changing it to `system` on windows worked. On 
these scenarios, memory usage as reported by `memory_profiler` is kept under 
2GiB.
   
   #### Using `pyarrow.set_memory_pool()`
   
   Using the `set_memory_pool` method to change the default memory pool does 
not prevent the memory leak, resulting in the same increases and peak memory 
usage as `mimalloc`.
   
   Even though the call seems to work, and `pyarrow.default_memory_pool()` 
correctly reports the memory pool I have configured, this seems to have no 
practical effect on the allocations.
   
   Tested this by setting the pool to `pyarrow.system_memory_pool()` and 
`pyarrow.jemalloc_memory_pool()` on linux and to `pyarrow.system_memory_pool()` 
on windows.
   
   Passing a specific memory pool to the `to_table()` method is also 
ineffective.
   
   #### Calling `pa.default_memory_pool().release_unused()` in between 
iterations
   
   Changed the script to call `release_unused` after each read, tried on linux 
and windows. This does not prevent the leak, and the peak memory usage remains 
the same.
   
   ```
           for _ in range(50):
               read()
               pyarrow.default_memory_pool().release_unused()
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Python][Parquet] 21.0.0 release introduced memory leak when reading parquet [arrow]

Reply via email to