[ 
https://issues.apache.org/jira/browse/ARROW-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-12650:
-----------------------------------
    Fix Version/s:     (was: 5.0.0)
                   6.0.0

> [Doc][Python] Improve documentation regarding dealing with memory mapped files
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-12650
>                 URL: https://issues.apache.org/jira/browse/ARROW-12650
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Alessandro Molina
>            Assignee: Alessandro Molina
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 6.0.0
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> While one of the Arrow promises is that it makes easy to read/write data 
> bigger than memory, it's not immediately obvious from the pyarrow 
> documentation how to deal with memory mapped files.
> The doc hints that you can open files as memory mapped ( 
> [https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files]
>  ) but then it doesn't explain how to read/write Arrow Arrays or Tables from 
> there.
> While most high level functions to read/write formats (pqt, feather, ...) 
> have an easy to guess {{memory_map=True}} option, the doc doesn't seem to 
> have any example of how that is meant to work for Arrow format itself. For 
> example how you can do that using {{RecordBatchFile*}}. 
> An addition to the memory mapping section that makes a more meaningful 
> example that reads/writes actual arrow data (instead of plain bytes) would 
> probably be helpful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to