Using this PR as a WIP to efficiently transfer data from R to Spark using Arrow.

This PR might be ultimately closed and not merged, but thought it would be good 
to give visibility as to what I'm exploring.

Specifically, I'm working on supporting efficient execution of:

```r
library(sparklyr)
sc <- spark_connect(master = "local")
copy_to(sc, system.time({
        tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", 
overwrite = TRUE)
})
```

Currently, without this PR at nor Arrow:

```r
system.time({
  tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", 
overwrite = TRUE)
})
   user  system elapsed 
  1.120   0.087   3.482 
```

Using arrow down to:

```r
library(arrow)
copy_to(sc, system.time({
  tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", 
overwrite = TRUE)
})
```
```
   user  system elapsed 
  0.222   0.029   0.641 
```

And down to the following while using `std::copy()`

```
   user  system elapsed 
  0.107   0.008   0.388 
```

Currently exploring 
[ALTREP](https://svn.r-project.org/R/branches/ALTREP/ALTREP.html) to avoid 
`std::copy()`....

[ Full content available at: https://github.com/apache/arrow/pull/2727 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to