[ https://issues.apache.org/jira/browse/ARROW-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170159#comment-17170159 ]
Wes McKinney commented on ARROW-9633: ------------------------------------- Note that a similar filesystems API in TensorFlow has the {{NewReadOnlyMemoryRegionFromFile}} API, I'm not sure what its semantics are with remote filesystems > [C++] Do not toggle memory mapping globally in LocalFileSystem > -------------------------------------------------------------- > > Key: ARROW-9633 > URL: https://issues.apache.org/jira/browse/ARROW-9633 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Priority: Major > Fix For: 2.0.0 > > > In the context of the Datasets API, some file formats benefit greatly from > memory mapping (like Arrow IPC files) while other less so. Additionally, in > some scenarios, memory mapping could fail when used on network-attached > storage devices. Since a filesystem may be used to read different kinds of > files and use both memory mapping and non-memory mapping, and additionally > the Datasets API should be able to fall back on non-memory mapping if the > attempt to memory map fails, it would make sense to have a non-global option > for this: > https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h > I would suggest adding a new filesystem API with something like > {{OpenMappedInputFile}} with some options to control the behavior when memory > mapping is not possible. These options may be among: > * Falling back on a normal RandomAccessFile > * Reading the entire file into memory (or even tmpfs?) and then wrapping it > in a BufferReader > * Failing -- This message was sent by Atlassian Jira (v8.3.4#803005)