[
https://issues.apache.org/jira/browse/ARROW-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-9633:
----------------------------------
Fix Version/s: (was: 4.0.0)
5.0.0
> [C++] Do not toggle memory mapping globally in LocalFileSystem
> --------------------------------------------------------------
>
> Key: ARROW-9633
> URL: https://issues.apache.org/jira/browse/ARROW-9633
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 5.0.0
>
>
> In the context of the Datasets API, some file formats benefit greatly from
> memory mapping (like Arrow IPC files) while other less so. Additionally, in
> some scenarios, memory mapping could fail when used on network-attached
> storage devices. Since a filesystem may be used to read different kinds of
> files and use both memory mapping and non-memory mapping, and additionally
> the Datasets API should be able to fall back on non-memory mapping if the
> attempt to memory map fails, it would make sense to have a non-global option
> for this:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h
> I would suggest adding a new filesystem API with something like
> {{OpenMappedInputFile}} with some options to control the behavior when memory
> mapping is not possible. These options may be among:
> * Falling back on a normal RandomAccessFile
> * Reading the entire file into memory (or even tmpfs?) and then wrapping it
> in a BufferReader
> * Failing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)