[ 
https://issues.apache.org/jira/browse/ARROW-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Molina updated ARROW-12650:
--------------------------------------
    Description: 
While one of the Arrow promises is that it makes easy to read/write data bigger 
than memory, it's not immediately obvious from the pyarrow documentation how to 
deal with memory mapped files.

The doc hints that you can open files as memory mapped ( 
[https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files]
 ) but then it doesn't explain how to read/write Arrow Arrays or Tables from 
there.

While most high level functions to read/write formats (pqt, feather, ...) have 
an easy to guess {{memory_map=True}} option, the doc doesn't seem to have any 
example of how that is meant to work for Arrow format itself. For example how 
you can do that using {{RecordBatchFile*}}. 

An addition to the memory mapping section that makes a more meaningful example 
that reads/writes actual arrow data (instead of plain bytes) would probably be 
more helpful

  was:
While one of the Arrow promises is that it makes easy to read/write data bigger 
than memory, it's not immediately obvious from the pyarrow documentation how to 
deal with memory mapped files.

We hint that you can open files as memory mapped ( 
[https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files]
 ) but then we don't explain how to read/write Arrow Arrays or Tables from 
there.

While most high level functions to read/write formats (pqt, feather, ...) have 
an easy to guess {{memory_map=True}} option, we don't have any example of how 
that is meant to work for Arrow format itself. For example how you can do that 
using {{RecordBatchFile*}}. 

An addition to the memory mapping section that makes a more meaningful example 
that reads/writes actual arrow data (instead of plain bytes) would probably be 
more helpful


> [Doc][Python] Improve documentation regarding dealing with memory mapped files
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-12650
>                 URL: https://issues.apache.org/jira/browse/ARROW-12650
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Alessandro Molina
>            Assignee: Alessandro Molina
>            Priority: Minor
>
> While one of the Arrow promises is that it makes easy to read/write data 
> bigger than memory, it's not immediately obvious from the pyarrow 
> documentation how to deal with memory mapped files.
> The doc hints that you can open files as memory mapped ( 
> [https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files]
>  ) but then it doesn't explain how to read/write Arrow Arrays or Tables from 
> there.
> While most high level functions to read/write formats (pqt, feather, ...) 
> have an easy to guess {{memory_map=True}} option, the doc doesn't seem to 
> have any example of how that is meant to work for Arrow format itself. For 
> example how you can do that using {{RecordBatchFile*}}. 
> An addition to the memory mapping section that makes a more meaningful 
> example that reads/writes actual arrow data (instead of plain bytes) would 
> probably be more helpful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to