[GitHub] [arrow] pitrou commented on a change in pull request #9730: ARROW-9878: [Python] Document caveats of to_pandas(self_destruct=True)

GitBox Wed, 17 Mar 2021 07:20:57 -0700


pitrou commented on a change in pull request #9730:
URL: https://github.com/apache/arrow/pull/9730#discussion_r596063175




##########
File path: docs/source/python/pandas.rst
##########
@@ -293,3 +293,19 @@ Used together, the call
 
 will yield significantly lower memory usage in some scenarios. Without these
 options, ``to_pandas`` will always double memory.
+
+Note that ``self_destruct=True`` is not guaranteed to save memory. Since the
+conversion happens column by column, memory is also freed column by column. But
+if multiple columns share an underlying allocation, then no memory will be
+freed until all of those columns are converted. In particular, data that comes
+from IPC or Flight is prone to this, as memory will be laid out as follows::
+
+  Record Batch 0: Allocation 0: array 0 chunk 0, array 1 chunk 0, ...
+  Record Batch 1: Allocation 1: array 0 chunk 1, array 1 chunk 1, ...
+  ...
+
+In this case, no memory can be freed until the entire table is converted, even
+with ``self_destruct=True``.
+
+Additionally, even if memory is freed by Arrow, depending on the allocator in
+use, the memory may not be returned to the operating system immediately.

Review comment:
       This is true of all deallocations, so I find it rather un-useful to 
mention specifically here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a change in pull request #9730: ARROW-9878: [Python] Document caveats of to_pandas(self_destruct=True)

Reply via email to