Sun Rui created SPARK-8041:
------------------------------

             Summary: Consistently pass SparkR library directory to SparkR 
application
                 Key: SPARK-8041
                 URL: https://issues.apache.org/jira/browse/SPARK-8041
             Project: Spark
          Issue Type: Improvement
          Components: SparkR
    Affects Versions: 1.4.0
            Reporter: Sun Rui


The SparkR package library directory path (RLibDir) is needed for SparkR 
applications for loading SparkR package and locating R helper files inside the 
package.

Currently, there are some places that the RLibDir needs to be specified.

First of all, when you programs a SparkR application, sparkR.init() allows you 
to pass a RLibDir parameter (by default, it is the same as the SparkR package's 
libname on the driver host). However, it seems not reasonable to hard-code 
RLibDir in a program. Instead, it would be more flexible to pass RLibDir via 
command line or env variable.

Additionally, for YARN cluster mode, RRunner depends on SPARK_HOME env variable 
to get the RLibDir (assume $SPARK_HOME/R/lib). 

So it would be better to define a consistent way to pass RLibDir to a SparkR 
application in all deployment modes. It could be a command line option for 
bin/sparkR or an env variable. It can be passed to a sparkR application, and we 
can remove the RLibDir parameter of sparkR.init(). When in YARN cluster mode, 
it can be passed to AM using spark.yarn.appMasterEnv.[EnvironmentVariableName] 
configuration option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to