lidavidm commented on a change in pull request #9730:
URL: https://github.com/apache/arrow/pull/9730#discussion_r596083089
##########
File path: docs/source/python/pandas.rst
##########
@@ -293,3 +293,19 @@ Used together, the call
will yield significantly lower memory usage in some scenarios. Without these
options, ``to_pandas`` will always double memory.
+
+Note that ``self_destruct=True`` is not guaranteed to save memory. Since the
+conversion happens column by column, memory is also freed column by column. But
+if multiple columns share an underlying allocation, then no memory will be
Review comment:
Yes, multiple arrays may share the same buffer. In IPC for instance, we
read a record batch's worth of data from the file at a time, and hence all
arrays in that batch share the same buffer. In Flight, similarly, we receive a
record batch's worth of data from gRPC and (for implementation reasons)
concatenate it into a single buffer, so we end up in the same situation. (I
don't think this applies to, say, Parquet or CSV, since there's actual decoding
for those formats, but haven't tested it.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]