[ 
https://issues.apache.org/jira/browse/ARROW-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170159#comment-17170159
 ] 

Wes McKinney commented on ARROW-9633:
-------------------------------------

Note that a similar filesystems API in TensorFlow has the 
{{NewReadOnlyMemoryRegionFromFile}} API, I'm not sure what its semantics are 
with remote filesystems

> [C++] Do not toggle memory mapping globally in LocalFileSystem
> --------------------------------------------------------------
>
>                 Key: ARROW-9633
>                 URL: https://issues.apache.org/jira/browse/ARROW-9633
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 2.0.0
>
>
> In the context of the Datasets API, some file formats benefit greatly from 
> memory mapping (like Arrow IPC files) while other less so. Additionally, in 
> some scenarios, memory mapping could fail when used on network-attached 
> storage devices. Since a filesystem may be used to read different kinds of 
> files and use both memory mapping and non-memory mapping, and additionally 
> the Datasets API should be able to fall back on non-memory mapping if the 
> attempt to memory map fails, it would make sense to have a non-global option 
> for this:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h
> I would suggest adding a new filesystem API with something like 
> {{OpenMappedInputFile}} with some options to control the behavior when memory 
> mapping is not possible. These options may be among:
> * Falling back on a normal RandomAccessFile
> * Reading the entire file into memory (or even tmpfs?) and then wrapping it 
> in a BufferReader
> * Failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to