[jira] [Created] (SPARK-17904) Add a wrapper function to download and install R packages on each executors.

Yanbo Liang (JIRA) Thu, 13 Oct 2016 02:01:13 -0700

Yanbo Liang created SPARK-17904:
-----------------------------------

             Summary: Add a wrapper function to download and install R packages 
on each executors.
                 Key: SPARK-17904
                 URL: https://issues.apache.org/jira/browse/SPARK-17904
             Project: Spark
          Issue Type: New Feature
          Components: SparkR
            Reporter: Yanbo Liang



SparkR provides {{spark.lappy}} to run local R functions in distributed 
environment, and {{dapply}} to run UDF on SparkDataFrame.
If users use third-party libraries inside of the function which was passed into 
{{spark.lappy}} or {{dapply}}, they should install required R packages on each 
executor in advance.
To install dependent R packages on each executors and check it successfully, we 
can run similar code like following:
{code}
rdd <- SparkR:::lapplyPartition(SparkR:::parallelize(sc, 1:2, 2L), 
install.packages("Matrix”))
test <- function(x) { "Matrix" %in% rownames(installed.packages()) }
rdd <- SparkR:::lapplyPartition(SparkR:::parallelize(sc, 1:2, 2L), test )
collectRDD(rdd)
{code}
It’s cumbersome to run this code snippet each time when you need third-party 
library, since SparkR is an interactive analytics tools, users may call lots of 
libraries during the analytics session. In native R, users can run 
{{install.packages()}} and {{library()}} across the interactive session.
Should we provide one API to wrapper the work mentioned above, then users can 
install dependent R packages to each executor easily? 
I propose the following API:
{{spark.installPackages(pkgs, repos)}}
* pkgs: the name of packages. If repos = NULL, this can be set with a 
local/hdfs path, then SparkR can install packages from local package archives.
* repos: the base URL(s) of the repositories to use. It can be NULL to install 
from local directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-17904) Add a wrapper function to download and install R packages on each executors.

Reply via email to