Wes McKinney created ARROW-6570:
-----------------------------------
Summary: [Python] Use MemoryPool to allocate memory for NumPy
arrays in to_pandas calls
Key: ARROW-6570
URL: https://issues.apache.org/jira/browse/ARROW-6570
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Wes McKinney
Fix For: 0.15.0
It occurred to me that we can likely improve the performance and scalability of
{{Table.to_pandas}} or other {{to_pandas}} methods by using the active
MemoryPool to allocate memory for the array rather than letting NumPy use the
system allocator. We would need to use the {{PyCapsule}} approach to setting a
{{shared_ptr<Buffer>}} as the base of the created NumPy arrays
This has the additional benefit of tracking NumPy-related allocations in the
MemoryPool so we will have a more precise accounting of allocated memory.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)