Using this PR as a WIP to efficiently transfer data from R to Spark using Arrow.
This PR might be ultimately closed and not merged, but thought it would be good
to give visibility as to what I'm exploring.
Specifically, I'm working on supporting efficient execution of:
```r
library(sparklyr)
sc <- spark_connect(master = "local")
copy_to(sc, system.time({
tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data",
overwrite = TRUE)
})
```
Currently, without this PR at nor Arrow:
```r
system.time({
tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data",
overwrite = TRUE)
})
user system elapsed
1.120 0.087 3.482
```
Using arrow down to:
```r
library(arrow)
copy_to(sc, system.time({
tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data",
overwrite = TRUE)
})
```
```
user system elapsed
0.222 0.029 0.641
```
And down to the following while using `std::copy()`
```
user system elapsed
0.107 0.008 0.388
```
Currently exploring
[ALTREP](https://svn.r-project.org/R/branches/ALTREP/ALTREP.html) to avoid
`std::copy()`....
[ Full content available at: https://github.com/apache/arrow/pull/2727 ]
This message was relayed via gitbox.apache.org for [email protected]