[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949199#comment-14949199 ]
Thomas Graves commented on SPARK-10971: --------------------------------------- you shouldn't have to install everything a user needs on the YARN nodes. This can cause many different types of issues, the main one being version conflicts and a Maintenance head ache. The only downside to that is if you aren't using the distributed cache properly there is overhead in downloading that. Perhaps there are distributions that don't recommend or cases you want it installed for performance reasons but a general use YARN cluster needs to allow users to send their dependencies with their applications. So yes I am just suggesting the path to Rscript be configurable. You should be able to set a config like spark.sparkr.r.command to point to where Rscript is located. > sparkR: RRunner should allow setting path to Rscript > ---------------------------------------------------- > > Key: SPARK-10971 > URL: https://issues.apache.org/jira/browse/SPARK-10971 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 1.5.1 > Reporter: Thomas Graves > > I'm running spark on yarn and trying to use R in cluster mode. RRunner seems > to just call Rscript and assumes its in the path. But on our YARN deployment > R isn't installed on the nodes so it needs to be distributed along with the > job and we need the ability to point to where it gets installed. sparkR in > client mode has the config spark.sparkr.r.command to point to Rscript. > RRunner should have something similar so it works in cluster mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org