[
https://issues.apache.org/jira/browse/ARROW-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170159#comment-17170159
]
Wes McKinney commented on ARROW-9633:
-------------------------------------
Note that a similar filesystems API in TensorFlow has the
{{NewReadOnlyMemoryRegionFromFile}} API, I'm not sure what its semantics are
with remote filesystems
> [C++] Do not toggle memory mapping globally in LocalFileSystem
> --------------------------------------------------------------
>
> Key: ARROW-9633
> URL: https://issues.apache.org/jira/browse/ARROW-9633
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 2.0.0
>
>
> In the context of the Datasets API, some file formats benefit greatly from
> memory mapping (like Arrow IPC files) while other less so. Additionally, in
> some scenarios, memory mapping could fail when used on network-attached
> storage devices. Since a filesystem may be used to read different kinds of
> files and use both memory mapping and non-memory mapping, and additionally
> the Datasets API should be able to fall back on non-memory mapping if the
> attempt to memory map fails, it would make sense to have a non-global option
> for this:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h
> I would suggest adding a new filesystem API with something like
> {{OpenMappedInputFile}} with some options to control the behavior when memory
> mapping is not possible. These options may be among:
> * Falling back on a normal RandomAccessFile
> * Reading the entire file into memory (or even tmpfs?) and then wrapping it
> in a BufferReader
> * Failing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)