For host information, are you looking for something like this (which is available today in Spark 1.5 already) ?
# Spark related configuration Sys.setenv("SPARK_MASTER_IP"="127.0.0.1") Sys.setenv("SPARK_LOCAL_IP"="127.0.0.1") #Load libraries library("rJava") library(SparkR, lib.loc="/...../spark-bin/R/lib") #Initalize spark context sc <- sparkR.init(sparkHome = "/...../spark-bin", sparkPackages="com.databricks:spark-csv_2.11:1.2.0") On Thu, Sep 24, 2015 at 2:09 PM, Hossein <fal...@gmail.com> wrote: > Right now in sparkR.R the backend hostname is hard coded to "localhost" ( > https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L156). > > If we make that address configurable / parameterized, then a user can > connect a remote Spark cluster with no need to have spark jars on their > local machine. I have got this request from some R users. Their company has > a Spark cluster (usually managed by another team), and they want to connect > to it from their workstation (e.g., from within RStudio, etc). > > > > --Hossein > > On Thu, Sep 24, 2015 at 12:25 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> I don't think the crux of the problem is about users who download the >> source -- Spark's source distribution is clearly marked as something >> that needs to be built and they can run `mvn -DskipTests -Psparkr >> package` based on instructions in the Spark docs. >> >> The crux of the problem is that with a source or binary R package, the >> client side the SparkR code needs the Spark JARs to be available. So >> we can't just connect to a remote Spark cluster using just the R >> scripts as we need the Scala classes around to create a Spark context >> etc. >> >> But this is a use case that I've heard from a lot of users -- my take >> is that this should be a separate package / layer on top of SparkR. >> Dan Putler (cc'd) had a proposal on a client package for this and >> maybe able to add more. >> >> Thanks >> Shivaram >> >> On Thu, Sep 24, 2015 at 11:36 AM, Hossein <fal...@gmail.com> wrote: >> > Requiring users to download entire Spark distribution to connect to a >> remote >> > cluster (which is already running Spark) seems an over kill. Even for >> most >> > spark users who download Spark source, it is very unintuitive that they >> need >> > to run a script named "install-dev.sh" before they can run SparkR. >> > >> > --Hossein >> > >> > On Wed, Sep 23, 2015 at 7:28 PM, Sun, Rui <rui....@intel.com> wrote: >> >> >> >> SparkR package is not a standalone R package, as it is actually R API >> of >> >> Spark and needs to co-operate with a matching version of Spark, so >> exposing >> >> it in CRAN does not ease use of R users as they need to download >> matching >> >> Spark distribution, unless we expose a bundled SparkR package to CRAN >> >> (packageing with Spark), is this desirable? Actually, for normal users >> who >> >> are not developers, they are not required to download Spark source, >> build >> >> and install SparkR package. They just need to download a Spark >> distribution, >> >> and then use SparkR. >> >> >> >> >> >> >> >> For using SparkR in Rstudio, there is a documentation at >> >> https://github.com/apache/spark/tree/master/R >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> From: Hossein [mailto:fal...@gmail.com] >> >> Sent: Thursday, September 24, 2015 1:42 AM >> >> To: shiva...@eecs.berkeley.edu >> >> Cc: Sun, Rui; dev@spark.apache.org >> >> Subject: Re: SparkR package path >> >> >> >> >> >> >> >> Yes, I think exposing SparkR in CRAN can significantly expand the >> reach of >> >> both SparkR and Spark itself to a larger community of data scientists >> (and >> >> statisticians). >> >> >> >> >> >> >> >> I have been getting questions on how to use SparkR in RStudio. Most of >> >> these folks have a Spark Cluster and wish to talk to it from RStudio. >> While >> >> that is a bigger task, for now, first step could be not requiring them >> to >> >> download Spark source and run a script that is named install-dev.sh. I >> filed >> >> SPARK-10776 to track this. >> >> >> >> >> >> >> >> >> >> --Hossein >> >> >> >> >> >> >> >> On Tue, Sep 22, 2015 at 7:21 PM, Shivaram Venkataraman >> >> <shiva...@eecs.berkeley.edu> wrote: >> >> >> >> As Rui says it would be good to understand the use case we want to >> >> support (supporting CRAN installs could be one for example). I don't >> >> think it should be very hard to do as the RBackend itself doesn't use >> >> the R source files. The RRDD does use it and the value comes from >> >> >> >> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L29 >> >> AFAIK -- So we could introduce a new config flag that can be used for >> >> this new mode. >> >> >> >> Thanks >> >> Shivaram >> >> >> >> >> >> On Mon, Sep 21, 2015 at 8:15 PM, Sun, Rui <rui....@intel.com> wrote: >> >> > Hossein, >> >> > >> >> > >> >> > >> >> > Any strong reason to download and install SparkR source package >> >> > separately >> >> > from the Spark distribution? >> >> > >> >> > An R user can simply download the spark distribution, which contains >> >> > SparkR >> >> > source and binary package, and directly use sparkR. No need to >> install >> >> > SparkR package at all. >> >> > >> >> > >> >> > >> >> > From: Hossein [mailto:fal...@gmail.com] >> >> > Sent: Tuesday, September 22, 2015 9:19 AM >> >> > To: dev@spark.apache.org >> >> > Subject: SparkR package path >> >> > >> >> > >> >> > >> >> > Hi dev list, >> >> > >> >> > >> >> > >> >> > SparkR backend assumes SparkR source files are located under >> >> > "SPARK_HOME/R/lib/." This directory is created by running >> >> > R/install-dev.sh. >> >> > This setting makes sense for Spark developers, but if an R user >> >> > downloads >> >> > and installs SparkR source package, the source files are going to be >> in >> >> > placed different locations. >> >> > >> >> > >> >> > >> >> > In the R runtime it is easy to find location of package files using >> >> > path.package("SparkR"). But we need to make some changes to R backend >> >> > and/or >> >> > spark-submit so that, JVM process learns the location of worker.R and >> >> > daemon.R and shell.R from the R runtime. >> >> > >> >> > >> >> > >> >> > Do you think this change is feasible? >> >> > >> >> > >> >> > >> >> > Thanks, >> >> > >> >> > --Hossein >> >> >> >> >> > >> > >> > > -- Luciano Resende http://people.apache.org/~lresende http://twitter.com/lresende1975 http://lresende.blogspot.com/