lucasKoeb commented on issue #47266: URL: https://github.com/apache/arrow/issues/47266#issuecomment-3207576731
Experienced similar issues after updating to 21.0.0 and had to revert back to 20.0.0 for now. Could reproduce the issue using the provided sample script on python 3.12.10 on linux and windows. Tried some variations on the reproduction script: #### Using the ARROW_DEFAULT_MEMORY_POOL environment variable Using the environment variable to change the default to `system` or `jemalloc` on linux, and also changing it to `system` on windows worked. On these scenarios, memory usage as reported by `memory_profiler` is kept under 2GiB. #### Using `pyarrow.set_memory_pool()` Using the `set_memory_pool` method to change the default memory pool does not prevent the memory leak, resulting in the same increases and peak memory usage as `mimalloc`. Even though the call seems to work, and `pyarrow.default_memory_pool()` correctly reports the memory pool I have configured, this seems to have no practical effect on the allocations. Tested this by setting the pool to `pyarrow.system_memory_pool()` and `pyarrow.jemalloc_memory_pool()` on linux and to `pyarrow.system_memory_pool()` on windows. Passing a specific memory pool to the `to_table()` method is also ineffective. #### Calling `pa.default_memory_pool().release_unused()` in between iterations Changed the script to call `release_unused` after each read, tried on linux and windows. This does not prevent the leak, and the peak memory usage remains the same. ``` for _ in range(50): read() pyarrow.default_memory_pool().release_unused() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org