Re: [I] [C++] Reevaluating the purpose of `oversize_threshold:0` for jemalloc allocator [arrow]

via GitHub Tue, 08 Jul 2025 02:42:03 -0700


MMCMA commented on issue #46929:
URL: https://github.com/apache/arrow/issues/46929#issuecomment-3048139846


   
   
   Not sure if this following issue belongs here, but it seems related:
   
   We observed a significant increase in memory consumption when upgrading from 
`pyarrow` 17.0.0 to later versions, particularly during calls to 
`.to_pandas()`. For example, comparing memory usage between `pyarrow` 20.0.0 
and 17.0.0:
   
   ```
   tbl_a : 14.0 GB -> 19.3 GB  
   tbl_b : 17.9 GB -> 39.3 GB  
   tbl_c : 23.7 GB -> 46.4 GB  
   ```
   
   The function call is simply:
   
   ```python
   tbl.to_pandas(self_destruct=True, split_blocks=True)
   ```
   
   Has anything changed in `.to_pandas()` implementation or memory management 
post-17.0.0 that could explain this behavior?
   
   Other than downgrading back to 17.0.0, is there a recommended way to 
mitigate this increased memory usage?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [C++] Reevaluating the purpose of `oversize_threshold:0` for jemalloc allocator [arrow]

Reply via email to