For host information, are you looking for something like this (which is
available today in Spark 1.5 already) ?

# Spark related configuration
Sys.setenv("SPARK_MASTER_IP"="127.0.0.1")
Sys.setenv("SPARK_LOCAL_IP"="127.0.0.1")

#Load libraries
library("rJava")
library(SparkR, lib.loc="/...../spark-bin/R/lib")

#Initalize  spark context
sc <- sparkR.init(sparkHome = "/...../spark-bin",
sparkPackages="com.databricks:spark-csv_2.11:1.2.0")



On Thu, Sep 24, 2015 at 2:09 PM, Hossein <fal...@gmail.com> wrote:

> Right now in sparkR.R the backend hostname is hard coded to "localhost" (
> https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L156).
>
> If we make that address configurable / parameterized, then a user can
> connect a remote Spark cluster with no need to have spark jars on their
> local machine. I have got this request from some R users. Their company has
> a Spark cluster (usually managed by another team), and they want to connect
> to it from their workstation (e.g., from within RStudio, etc).
>
>
>
> --Hossein
>
> On Thu, Sep 24, 2015 at 12:25 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> I don't think the crux of the problem is about users who download the
>> source -- Spark's source distribution is clearly marked as something
>> that needs to be built and they can run `mvn -DskipTests -Psparkr
>> package` based on instructions in the Spark docs.
>>
>> The crux of the problem is that with a source or binary R package, the
>> client side the SparkR code needs the Spark JARs to be available. So
>> we can't just connect to a remote Spark cluster using just the R
>> scripts as we need the Scala classes around to create a Spark context
>> etc.
>>
>> But this is a use case that I've heard from a lot of users -- my take
>> is that this should be a separate package / layer on top of SparkR.
>> Dan Putler (cc'd) had a proposal on a client package for this and
>> maybe able to add more.
>>
>> Thanks
>> Shivaram
>>
>> On Thu, Sep 24, 2015 at 11:36 AM, Hossein <fal...@gmail.com> wrote:
>> > Requiring users to download entire Spark distribution to connect to a
>> remote
>> > cluster (which is already running Spark) seems an over kill. Even for
>> most
>> > spark users who download Spark source, it is very unintuitive that they
>> need
>> > to run a script named "install-dev.sh" before they can run SparkR.
>> >
>> > --Hossein
>> >
>> > On Wed, Sep 23, 2015 at 7:28 PM, Sun, Rui <rui....@intel.com> wrote:
>> >>
>> >> SparkR package is not a standalone R package, as it is actually R API
>> of
>> >> Spark and needs to co-operate with a matching version of Spark, so
>> exposing
>> >> it in CRAN does not ease use of R users as they need to download
>> matching
>> >> Spark distribution, unless we expose a bundled SparkR package to CRAN
>> >> (packageing with Spark), is this desirable? Actually, for normal users
>> who
>> >> are not developers, they are not required to download Spark source,
>> build
>> >> and install SparkR package. They just need to download a Spark
>> distribution,
>> >> and then use SparkR.
>> >>
>> >>
>> >>
>> >> For using SparkR in Rstudio, there is a documentation at
>> >> https://github.com/apache/spark/tree/master/R
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Hossein [mailto:fal...@gmail.com]
>> >> Sent: Thursday, September 24, 2015 1:42 AM
>> >> To: shiva...@eecs.berkeley.edu
>> >> Cc: Sun, Rui; dev@spark.apache.org
>> >> Subject: Re: SparkR package path
>> >>
>> >>
>> >>
>> >> Yes, I think exposing SparkR in CRAN can significantly expand the
>> reach of
>> >> both SparkR and Spark itself to a larger community of data scientists
>> (and
>> >> statisticians).
>> >>
>> >>
>> >>
>> >> I have been getting questions on how to use SparkR in RStudio. Most of
>> >> these folks have a Spark Cluster and wish to talk to it from RStudio.
>> While
>> >> that is a bigger task, for now, first step could be not requiring them
>> to
>> >> download Spark source and run a script that is named install-dev.sh. I
>> filed
>> >> SPARK-10776 to track this.
>> >>
>> >>
>> >>
>> >>
>> >> --Hossein
>> >>
>> >>
>> >>
>> >> On Tue, Sep 22, 2015 at 7:21 PM, Shivaram Venkataraman
>> >> <shiva...@eecs.berkeley.edu> wrote:
>> >>
>> >> As Rui says it would be good to understand the use case we want to
>> >> support (supporting CRAN installs could be one for example). I don't
>> >> think it should be very hard to do as the RBackend itself doesn't use
>> >> the R source files. The RRDD does use it and the value comes from
>> >>
>> >>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L29
>> >> AFAIK -- So we could introduce a new config flag that can be used for
>> >> this new mode.
>> >>
>> >> Thanks
>> >> Shivaram
>> >>
>> >>
>> >> On Mon, Sep 21, 2015 at 8:15 PM, Sun, Rui <rui....@intel.com> wrote:
>> >> > Hossein,
>> >> >
>> >> >
>> >> >
>> >> > Any strong reason to download and install SparkR source package
>> >> > separately
>> >> > from the Spark distribution?
>> >> >
>> >> > An R user can simply download the spark distribution, which contains
>> >> > SparkR
>> >> > source and binary package, and directly use sparkR. No need to
>> install
>> >> > SparkR package at all.
>> >> >
>> >> >
>> >> >
>> >> > From: Hossein [mailto:fal...@gmail.com]
>> >> > Sent: Tuesday, September 22, 2015 9:19 AM
>> >> > To: dev@spark.apache.org
>> >> > Subject: SparkR package path
>> >> >
>> >> >
>> >> >
>> >> > Hi dev list,
>> >> >
>> >> >
>> >> >
>> >> > SparkR backend assumes SparkR source files are located under
>> >> > "SPARK_HOME/R/lib/." This directory is created by running
>> >> > R/install-dev.sh.
>> >> > This setting makes sense for Spark developers, but if an R user
>> >> > downloads
>> >> > and installs SparkR source package, the source files are going to be
>> in
>> >> > placed different locations.
>> >> >
>> >> >
>> >> >
>> >> > In the R runtime it is easy to find location of package files using
>> >> > path.package("SparkR"). But we need to make some changes to R backend
>> >> > and/or
>> >> > spark-submit so that, JVM process learns the location of worker.R and
>> >> > daemon.R and shell.R from the R runtime.
>> >> >
>> >> >
>> >> >
>> >> > Do you think this change is feasible?
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > --Hossein
>> >>
>> >>
>> >
>> >
>>
>
>


-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Reply via email to