[ https://issues.apache.org/jira/browse/ARROW-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861084#comment-16861084 ]
Romain François commented on ARROW-5502: ---------------------------------------- You can memory map right now, although at this point data is being copied to R vectors rather than borrowed from the memory mapped file, we'll need to use ALTREP to go further. The file argument of most reading functions may be an instance of arrow::io::MemoryMappedFile, which you get by using the mmap_open() function in R: {code} library(arrow, warn.conflicts = FALSE) library(tibble) tf <- tempfile() write.csv(iris, tf, row.names = FALSE, quote = FALSE) f <- mmap_open(tf) f #> arrow::io::MemoryMappedFile tab <- read_csv_arrow(f) as_tibble(tab) #> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # … with 140 more rows {code} Created on 2019-06-11 by the [reprex package|https://reprex.tidyverse.org/] (v0.3.0.9000) > [R] file readers should mmap > ---------------------------- > > Key: ARROW-5502 > URL: https://issues.apache.org/jira/browse/ARROW-5502 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Neal Richardson > Priority: Major > Fix For: 0.14.0 > > > Arrow is supposed to let you work with datasets bigger than memory. Memory > mapping is a big part of that. It should be the default way that files are > read in the `read_*` functions. To disable memory mapping, we could use a > global `option()`, or a function argument, but that might clutter the > interface. Or we could not give a choice and only fall back to not memory > mapping if the platform/file system doesn't support it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)