Sun Rui created SPARK-8041:
------------------------------
Summary: Consistently pass SparkR library directory to SparkR
application
Key: SPARK-8041
URL: https://issues.apache.org/jira/browse/SPARK-8041
Project: Spark
Issue Type: Improvement
Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui
The SparkR package library directory path (RLibDir) is needed for SparkR
applications for loading SparkR package and locating R helper files inside the
package.
Currently, there are some places that the RLibDir needs to be specified.
First of all, when you programs a SparkR application, sparkR.init() allows you
to pass a RLibDir parameter (by default, it is the same as the SparkR package's
libname on the driver host). However, it seems not reasonable to hard-code
RLibDir in a program. Instead, it would be more flexible to pass RLibDir via
command line or env variable.
Additionally, for YARN cluster mode, RRunner depends on SPARK_HOME env variable
to get the RLibDir (assume $SPARK_HOME/R/lib).
So it would be better to define a consistent way to pass RLibDir to a SparkR
application in all deployment modes. It could be a command line option for
bin/sparkR or an env variable. It can be passed to a sparkR application, and we
can remove the RLibDir parameter of sparkR.init(). When in YARN cluster mode,
it can be passed to AM using spark.yarn.appMasterEnv.[EnvironmentVariableName]
configuration option.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]