Alessandro Molina created ARROW-12650:
-----------------------------------------
Summary: [Python] Improve documentation regarding dealing with
memory mapped files
Key: ARROW-12650
URL: https://issues.apache.org/jira/browse/ARROW-12650
Project: Apache Arrow
Issue Type: Improvement
Reporter: Alessandro Molina
While one of the Arrow promises is that it makes easy to read/write data bigger
than memory, it's not immediately obvious from the pyarrow documentation how to
deal with memory mapped files.
We hint that you can open files as memory mapped (
[https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files]
) but then we don't explain how to read/write Arrow Arrays or Tables from
there.
While most high level functions to read/write formats (pqt, feather, ...) have
an easy to guess {{memory_map=True}} option, we don't have any example of how
that is meant to work for Arrow format itself. For example how you can do that
using {{RecordBatchFile*}}.
An addition to the memory mapping section that makes a more meaningful example
that reads/writes actual arrow data (instead of plain bytes) would probably be
more helpful
--
This message was sent by Atlassian Jira
(v8.3.4#803005)