Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav,

I did a prove as you said, for me it’s working, I am using spark in local mode, 
master and worker in the same machine. I run the example in spark-shell 
—package com.databricks:spark-csv_2.10:1.3.0 without errors.

BR

From:  Gourav Sengupta <gourav.sengu...@gmail.com>
Date:  Monday, February 15, 2016 at 10:03
To:  Jorge Machado <jom...@me.com>
Cc:  Spark Group <user@spark.apache.org>
Subject:  Re: Using SPARK packages in Spark Cluster

Hi Jorge/ All,

Please please please go through this link  
http://spark.apache.org/docs/latest/spark-standalone.html. 
The link tells you how to start a SPARK cluster in local mode. If you have not 
started or worked in SPARK cluster in local mode kindly do not attempt in 
answering this question.

My question is how to use packages like  
https://github.com/databricks/spark-csv when I using SPARK cluster in local 
mode.

Regards,
Gourav Sengupta


On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado <jom...@me.com> wrote:
Hi Gourav, 

I did not unterstand your problem… the - - packages  command should not make 
any difference if you are running standalone or in YARN for example.  
Give us an example what packages are you trying to load, and what error are you 
getting…  If you want to use the libraries in spark-packages.org without the 
--packages why do you not use maven ? 
Regards 


On 12/02/2016, at 13:22, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:

Hi,

I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
http://spark.apache.org/docs/latest/spark-standalone.html using the following 
code:

--
sc.stop()
conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
  .setMaster("spark://hostname:7077") \
  .set('spark.shuffle.service.enabled', True) \
  .set('spark.dynamicAllocation.enabled','true') \
  .set('spark.executor.memory','20g') \
  .set('spark.driver.memory', '4g') \
  .set('spark.default.parallelism',(multiprocessing.cpu_count() 
-1 ))
conf.getAll()
sc = SparkContext(conf = conf)

-(we should definitely be able to optimise the configuration but that is 
not the point here) ---

I am not able to use packages, a list of which is mentioned here 
http://spark-packages.org, using this method. 

Where as if I use the standard "pyspark --packages" option then the packages 
load just fine.

I will be grateful if someone could kindly let me know how to load packages 
when starting a cluster as mentioned above.


Regards,
Gourav Sengupta




-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi Jorge/ All,

Please please please go through this link
http://spark.apache.org/docs/latest/spark-standalone.html.

The link tells you how to start a SPARK cluster in local mode. If you have
not started or worked in SPARK cluster in local mode kindly do not attempt
in answering this question.

My question is how to use packages like
https://github.com/databricks/spark-csv when I using SPARK cluster in local
mode.

Regards,
Gourav Sengupta



On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado  wrote:

> Hi Gourav,
>
> I did not unterstand your problem… the - - packages  command should not
> make any difference if you are running standalone or in YARN for example.
> Give us an example what packages are you trying to load, and what error
> are you getting…  If you want to use the libraries in spark-packages.org 
> without
> the --packages why do you not use maven ?
> Regards
>
>
> On 12/02/2016, at 13:22, Gourav Sengupta 
> wrote:
>
> Hi,
>
> I am creating sparkcontext in a SPARK standalone cluster as mentioned
> here: http://spark.apache.org/docs/latest/spark-standalone.html using the
> following code:
>
>
> --
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>   .setMaster("spark://hostname:7077") \
>   .set('spark.shuffle.service.enabled', True) \
>   .set('spark.dynamicAllocation.enabled','true') \
>   .set('spark.executor.memory','20g') \
>   .set('spark.driver.memory', '4g') \
>
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
>
> -(we should definitely be able to optimise the configuration but that
> is not the point here) ---
>
> I am not able to use packages, a list of which is mentioned here
> http://spark-packages.org, using this method.
>
> Where as if I use the standard "pyspark --packages" option then the
> packages load just fine.
>
> I will be grateful if someone could kindly let me know how to load
> packages when starting a cluster as mentioned above.
>
>
> Regards,
> Gourav Sengupta
>
>
>


Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
Hi Gourav, 

I did not unterstand your problem… the - - packages  command should not make 
any difference if you are running standalone or in YARN for example.  
Give us an example what packages are you trying to load, and what error are you 
getting…  If you want to use the libraries in spark-packages.org 
 without the --packages why do you not use maven ? 
Regards 


> On 12/02/2016, at 13:22, Gourav Sengupta  wrote:
> 
> Hi,
> 
> I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
> http://spark.apache.org/docs/latest/spark-standalone.html 
>  using the 
> following code:
> 
> --
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>   .setMaster("spark://hostname:7077") \
>   .set('spark.shuffle.service.enabled', True) \
>   .set('spark.dynamicAllocation.enabled','true') \
>   .set('spark.executor.memory','20g') \
>   .set('spark.driver.memory', '4g') \
>   
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
> 
> -(we should definitely be able to optimise the configuration but that is 
> not the point here) ---
> 
> I am not able to use packages, a list of which is mentioned here 
> http://spark-packages.org , using this method. 
> 
> Where as if I use the standard "pyspark --packages" option then the packages 
> load just fine.
> 
> I will be grateful if someone could kindly let me know how to load packages 
> when starting a cluster as mentioned above.
> 
> 
> Regards,
> Gourav Sengupta



Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi,

I am grateful for everyone's response, but sadly no one here actually has
read the question before responding.

Has anyone yet tried starting a SPARK cluster as mentioned in the link in
my email?

:)

Regards,
Gourav

On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado  wrote:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
>
>
>
> It will download everything for you and register into your  JVM.  If you
> want to use it in your Prod just package it with maven.
>
> On 15/02/2016, at 12:14, Gourav Sengupta 
> wrote:
>
> Hi,
>
> How to we include the following package:
> https://github.com/databricks/spark-csv while starting a SPARK standalone
> cluster as mentioned here:
> http://spark.apache.org/docs/latest/spark-standalone.html
>
>
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R 
> wrote:
>
>> Hi Gourav,
>>
>> If your question is how to distribute python package dependencies across
>> the Spark cluster programmatically? ...here is an example -
>>
>>  $ export
>> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
>>
>> And in code:
>>
>> sc.addPyFile('/path/to/thrift.zip')
>> sc.addPyFile('/path/to/happybase.zip')
>>
>> Regards,
>> Ram
>>
>>
>>
>> On 15 February 2016 at 15:16, Gourav Sengupta 
>> wrote:
>>
>>> Hi,
>>>
>>> So far no one is able to get my question at all. I know what it takes to
>>> load packages via SPARK shell or SPARK submit.
>>>
>>> How do I load packages when starting a SPARK cluster, as mentioned here
>>> http://spark.apache.org/docs/latest/spark-standalone.html ?
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>>
>>>
>>>
>>> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot 
>>> wrote:
>>>
 with conf option

 spark-submit --conf 'key = value '

 Hope that helps you.

 On 15 February 2016 at 11:21, Divya Gehlot 
 wrote:

> Hi Gourav,
> you can use like below to load packages at the start of the spark
> shell.
>
> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>
> On 14 February 2016 at 03:34, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> I was interested in knowing how to load the packages into SPARK
>> cluster started locally. Can someone pass me on the links to set the conf
>> file so that the packages can be loaded?
>>
>> Regards,
>> Gourav
>>
>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz 
>> wrote:
>>
>>> Hello Gourav,
>>>
>>> The packages need to be loaded BEFORE you start the JVM, therefore
>>> you won't be able to add packages dynamically in code. You should use 
>>> the
>>> --packages with pyspark before you start your application.
>>> One option is to add a `conf` that will load some packages if you
>>> are constantly going to use them.
>>>
>>> Best,
>>> Burak
>>>
>>>
>>>
>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
 Hi,

 I am creating sparkcontext in a SPARK standalone cluster as
 mentioned here:
 http://spark.apache.org/docs/latest/spark-standalone.html using
 the following code:


 --
 sc.stop()
 conf = SparkConf().set( 'spark.driver.allowMultipleContexts' ,
 False) \
   .setMaster("spark://hostname:7077") \
   .set('spark.shuffle.service.enabled', True) \
   .set('spark.dynamicAllocation.enabled','true') \
   .set('spark.executor.memory','20g') \
   .set('spark.driver.memory', '4g') \

 .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
 conf.getAll()
 sc = SparkContext(conf = conf)

 -(we should definitely be able to optimise the configuration
 but that is not the point here) ---

 I am not able to use packages, a list of which is mentioned here
 http://spark-packages.org, using this method.

 Where as if I use the standard "pyspark --packages" option then the
 packages load just fine.

 I will be grateful if someone could kindly let me know how to load
 packages when starting a cluster as mentioned above.


 Regards,
 Gourav Sengupta

>>>
>>>
>>
>

>>>
>>
>
>


Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0


It will download everything for you and register into your  JVM.  If you want 
to use it in your Prod just package it with maven. 

> On 15/02/2016, at 12:14, Gourav Sengupta  wrote:
> 
> Hi,
> 
> How to we include the following package: 
> https://github.com/databricks/spark-csv 
>  while starting a SPARK standalone 
> cluster as mentioned here: 
> http://spark.apache.org/docs/latest/spark-standalone.html 
> 
> 
> 
> 
> Thanks and Regards,
> Gourav Sengupta
> 
> On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R  > wrote:
> Hi Gourav, 
> 
> If your question is how to distribute python package dependencies across the 
> Spark cluster programmatically? ...here is an example - 
> 
>  $ export 
> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
> 
> And in code:
> 
> sc.addPyFile('/path/to/thrift.zip')
> sc.addPyFile('/path/to/happybase.zip')
> 
> Regards, 
> Ram
> 
> 
> 
> On 15 February 2016 at 15:16, Gourav Sengupta  > wrote:
> Hi,
> 
> So far no one is able to get my question at all. I know what it takes to load 
> packages via SPARK shell or SPARK submit. 
> 
> How do I load packages when starting a SPARK cluster, as mentioned here 
> http://spark.apache.org/docs/latest/spark-standalone.html 
>  ?
> 
> 
> Regards,
> Gourav Sengupta
> 
> 
> 
> 
> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot  > wrote:
> with conf option 
> 
> spark-submit --conf 'key = value '
> 
> Hope that helps you.
> 
> On 15 February 2016 at 11:21, Divya Gehlot  > wrote:
> Hi Gourav,
> you can use like below to load packages at the start of the spark shell.
> 
> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0   
> 
> On 14 February 2016 at 03:34, Gourav Sengupta  > wrote:
> Hi,
> 
> I was interested in knowing how to load the packages into SPARK cluster 
> started locally. Can someone pass me on the links to set the conf file so 
> that the packages can be loaded? 
> 
> Regards,
> Gourav
> 
> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz  > wrote:
> Hello Gourav,
> 
> The packages need to be loaded BEFORE you start the JVM, therefore you won't 
> be able to add packages dynamically in code. You should use the --packages 
> with pyspark before you start your application.
> One option is to add a `conf` that will load some packages if you are 
> constantly going to use them.
> 
> Best,
> Burak
> 
> 
> 
> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta  > wrote:
> Hi,
> 
> I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
> http://spark.apache.org/docs/latest/spark-standalone.html 
>  using the 
> following code:
> 
> --
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>   .setMaster("spark://hostname:7077") \
>   .set('spark.shuffle.service.enabled', True) \
>   .set('spark.dynamicAllocation.enabled','true') \
>   .set('spark.executor.memory','20g') \
>   .set('spark.driver.memory', '4g') \
>   
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
> 
> -(we should definitely be able to optimise the configuration but that is 
> not the point here) ---
> 
> I am not able to use packages, a list of which is mentioned here 
> http://spark-packages.org , using this method. 
> 
> Where as if I use the standard "pyspark --packages" option then the packages 
> load just fine.
> 
> I will be grateful if someone could kindly let me know how to load packages 
> when starting a cluster as mentioned above.
> 
> 
> Regards,
> Gourav Sengupta
> 
> 
> 
> 
> 
> 
> 



Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi,

How to we include the following package:
https://github.com/databricks/spark-csv while starting a SPARK standalone
cluster as mentioned here:
http://spark.apache.org/docs/latest/spark-standalone.html



Thanks and Regards,
Gourav Sengupta

On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R 
wrote:

> Hi Gourav,
>
> If your question is how to distribute python package dependencies across
> the Spark cluster programmatically? ...here is an example -
>
>  $ export
> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
>
> And in code:
>
> sc.addPyFile('/path/to/thrift.zip')
> sc.addPyFile('/path/to/happybase.zip')
>
> Regards,
> Ram
>
>
>
> On 15 February 2016 at 15:16, Gourav Sengupta 
> wrote:
>
>> Hi,
>>
>> So far no one is able to get my question at all. I know what it takes to
>> load packages via SPARK shell or SPARK submit.
>>
>> How do I load packages when starting a SPARK cluster, as mentioned here
>> http://spark.apache.org/docs/latest/spark-standalone.html ?
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>>
>>
>> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot 
>> wrote:
>>
>>> with conf option
>>>
>>> spark-submit --conf 'key = value '
>>>
>>> Hope that helps you.
>>>
>>> On 15 February 2016 at 11:21, Divya Gehlot 
>>> wrote:
>>>
 Hi Gourav,
 you can use like below to load packages at the start of the spark shell.

 spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0

 On 14 February 2016 at 03:34, Gourav Sengupta <
 gourav.sengu...@gmail.com> wrote:

> Hi,
>
> I was interested in knowing how to load the packages into SPARK
> cluster started locally. Can someone pass me on the links to set the conf
> file so that the packages can be loaded?
>
> Regards,
> Gourav
>
> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz  wrote:
>
>> Hello Gourav,
>>
>> The packages need to be loaded BEFORE you start the JVM, therefore
>> you won't be able to add packages dynamically in code. You should use the
>> --packages with pyspark before you start your application.
>> One option is to add a `conf` that will load some packages if you are
>> constantly going to use them.
>>
>> Best,
>> Burak
>>
>>
>>
>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>> gourav.sengu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am creating sparkcontext in a SPARK standalone cluster as
>>> mentioned here:
>>> http://spark.apache.org/docs/latest/spark-standalone.html using the
>>> following code:
>>>
>>>
>>> --
>>> sc.stop()
>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' ,
>>> False) \
>>>   .setMaster("spark://hostname:7077") \
>>>   .set('spark.shuffle.service.enabled', True) \
>>>   .set('spark.dynamicAllocation.enabled','true') \
>>>   .set('spark.executor.memory','20g') \
>>>   .set('spark.driver.memory', '4g') \
>>>
>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>>> conf.getAll()
>>> sc = SparkContext(conf = conf)
>>>
>>> -(we should definitely be able to optimise the configuration but
>>> that is not the point here) ---
>>>
>>> I am not able to use packages, a list of which is mentioned here
>>> http://spark-packages.org, using this method.
>>>
>>> Where as if I use the standard "pyspark --packages" option then the
>>> packages load just fine.
>>>
>>> I will be grateful if someone could kindly let me know how to load
>>> packages when starting a cluster as mentioned above.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>
>>
>

>>>
>>
>


Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Ramanathan R
Hi Gourav,

If your question is how to distribute python package dependencies across
the Spark cluster programmatically? ...here is an example -

 $ export
PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'

And in code:

sc.addPyFile('/path/to/thrift.zip')
sc.addPyFile('/path/to/happybase.zip')

Regards,
Ram



On 15 February 2016 at 15:16, Gourav Sengupta 
wrote:

> Hi,
>
> So far no one is able to get my question at all. I know what it takes to
> load packages via SPARK shell or SPARK submit.
>
> How do I load packages when starting a SPARK cluster, as mentioned here
> http://spark.apache.org/docs/latest/spark-standalone.html ?
>
>
> Regards,
> Gourav Sengupta
>
>
>
>
> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot 
> wrote:
>
>> with conf option
>>
>> spark-submit --conf 'key = value '
>>
>> Hope that helps you.
>>
>> On 15 February 2016 at 11:21, Divya Gehlot 
>> wrote:
>>
>>> Hi Gourav,
>>> you can use like below to load packages at the start of the spark shell.
>>>
>>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>>
>>> On 14 February 2016 at 03:34, Gourav Sengupta >> > wrote:
>>>
 Hi,

 I was interested in knowing how to load the packages into SPARK cluster
 started locally. Can someone pass me on the links to set the conf file so
 that the packages can be loaded?

 Regards,
 Gourav

 On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz  wrote:

> Hello Gourav,
>
> The packages need to be loaded BEFORE you start the JVM, therefore you
> won't be able to add packages dynamically in code. You should use the
> --packages with pyspark before you start your application.
> One option is to add a `conf` that will load some packages if you are
> constantly going to use them.
>
> Best,
> Burak
>
>
>
> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> I am creating sparkcontext in a SPARK standalone cluster as
>> mentioned here:
>> http://spark.apache.org/docs/latest/spark-standalone.html using the
>> following code:
>>
>>
>> --
>> sc.stop()
>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False)
>> \
>>   .setMaster("spark://hostname:7077") \
>>   .set('spark.shuffle.service.enabled', True) \
>>   .set('spark.dynamicAllocation.enabled','true') \
>>   .set('spark.executor.memory','20g') \
>>   .set('spark.driver.memory', '4g') \
>>
>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>> conf.getAll()
>> sc = SparkContext(conf = conf)
>>
>> -(we should definitely be able to optimise the configuration but
>> that is not the point here) ---
>>
>> I am not able to use packages, a list of which is mentioned here
>> http://spark-packages.org, using this method.
>>
>> Where as if I use the standard "pyspark --packages" option then the
>> packages load just fine.
>>
>> I will be grateful if someone could kindly let me know how to load
>> packages when starting a cluster as mentioned above.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>
>

>>>
>>
>


Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi,

So far no one is able to get my question at all. I know what it takes to
load packages via SPARK shell or SPARK submit.

How do I load packages when starting a SPARK cluster, as mentioned here
http://spark.apache.org/docs/latest/spark-standalone.html ?


Regards,
Gourav Sengupta




On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot 
wrote:

> with conf option
>
> spark-submit --conf 'key = value '
>
> Hope that helps you.
>
> On 15 February 2016 at 11:21, Divya Gehlot 
> wrote:
>
>> Hi Gourav,
>> you can use like below to load packages at the start of the spark shell.
>>
>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>
>> On 14 February 2016 at 03:34, Gourav Sengupta 
>> wrote:
>>
>>> Hi,
>>>
>>> I was interested in knowing how to load the packages into SPARK cluster
>>> started locally. Can someone pass me on the links to set the conf file so
>>> that the packages can be loaded?
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz  wrote:
>>>
 Hello Gourav,

 The packages need to be loaded BEFORE you start the JVM, therefore you
 won't be able to add packages dynamically in code. You should use the
 --packages with pyspark before you start your application.
 One option is to add a `conf` that will load some packages if you are
 constantly going to use them.

 Best,
 Burak



 On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
 gourav.sengu...@gmail.com> wrote:

> Hi,
>
> I am creating sparkcontext in a SPARK standalone cluster as mentioned
> here: http://spark.apache.org/docs/latest/spark-standalone.html using
> the following code:
>
>
> --
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>   .setMaster("spark://hostname:7077") \
>   .set('spark.shuffle.service.enabled', True) \
>   .set('spark.dynamicAllocation.enabled','true') \
>   .set('spark.executor.memory','20g') \
>   .set('spark.driver.memory', '4g') \
>
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
>
> -(we should definitely be able to optimise the configuration but
> that is not the point here) ---
>
> I am not able to use packages, a list of which is mentioned here
> http://spark-packages.org, using this method.
>
> Where as if I use the standard "pyspark --packages" option then the
> packages load just fine.
>
> I will be grateful if someone could kindly let me know how to load
> packages when starting a cluster as mentioned above.
>
>
> Regards,
> Gourav Sengupta
>


>>>
>>
>


Re: Using SPARK packages in Spark Cluster

2016-02-13 Thread Gourav Sengupta
Hi,

I was interested in knowing how to load the packages into SPARK cluster
started locally. Can someone pass me on the links to set the conf file so
that the packages can be loaded?

Regards,
Gourav

On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz  wrote:

> Hello Gourav,
>
> The packages need to be loaded BEFORE you start the JVM, therefore you
> won't be able to add packages dynamically in code. You should use the
> --packages with pyspark before you start your application.
> One option is to add a `conf` that will load some packages if you are
> constantly going to use them.
>
> Best,
> Burak
>
>
>
> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> I am creating sparkcontext in a SPARK standalone cluster as mentioned
>> here: http://spark.apache.org/docs/latest/spark-standalone.html using
>> the following code:
>>
>>
>> --
>> sc.stop()
>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>>   .setMaster("spark://hostname:7077") \
>>   .set('spark.shuffle.service.enabled', True) \
>>   .set('spark.dynamicAllocation.enabled','true') \
>>   .set('spark.executor.memory','20g') \
>>   .set('spark.driver.memory', '4g') \
>>
>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>> conf.getAll()
>> sc = SparkContext(conf = conf)
>>
>> -(we should definitely be able to optimise the configuration but that
>> is not the point here) ---
>>
>> I am not able to use packages, a list of which is mentioned here
>> http://spark-packages.org, using this method.
>>
>> Where as if I use the standard "pyspark --packages" option then the
>> packages load just fine.
>>
>> I will be grateful if someone could kindly let me know how to load
>> packages when starting a cluster as mentioned above.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>
>


Using SPARK packages in Spark Cluster

2016-02-12 Thread Gourav Sengupta
Hi,

I am creating sparkcontext in a SPARK standalone cluster as mentioned here:
http://spark.apache.org/docs/latest/spark-standalone.html using the
following code:

--
sc.stop()
conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
  .setMaster("spark://hostname:7077") \
  .set('spark.shuffle.service.enabled', True) \
  .set('spark.dynamicAllocation.enabled','true') \
  .set('spark.executor.memory','20g') \
  .set('spark.driver.memory', '4g') \

.set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
conf.getAll()
sc = SparkContext(conf = conf)

-(we should definitely be able to optimise the configuration but that
is not the point here) ---

I am not able to use packages, a list of which is mentioned here
http://spark-packages.org, using this method.

Where as if I use the standard "pyspark --packages" option then the
packages load just fine.

I will be grateful if someone could kindly let me know how to load packages
when starting a cluster as mentioned above.


Regards,
Gourav Sengupta


Re: Using SPARK packages in Spark Cluster

2016-02-12 Thread Burak Yavuz
Hello Gourav,

The packages need to be loaded BEFORE you start the JVM, therefore you
won't be able to add packages dynamically in code. You should use the
--packages with pyspark before you start your application.
One option is to add a `conf` that will load some packages if you are
constantly going to use them.

Best,
Burak



On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta 
wrote:

> Hi,
>
> I am creating sparkcontext in a SPARK standalone cluster as mentioned
> here: http://spark.apache.org/docs/latest/spark-standalone.html using the
> following code:
>
>
> --
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>   .setMaster("spark://hostname:7077") \
>   .set('spark.shuffle.service.enabled', True) \
>   .set('spark.dynamicAllocation.enabled','true') \
>   .set('spark.executor.memory','20g') \
>   .set('spark.driver.memory', '4g') \
>
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
>
> -(we should definitely be able to optimise the configuration but that
> is not the point here) ---
>
> I am not able to use packages, a list of which is mentioned here
> http://spark-packages.org, using this method.
>
> Where as if I use the standard "pyspark --packages" option then the
> packages load just fine.
>
> I will be grateful if someone could kindly let me know how to load
> packages when starting a cluster as mentioned above.
>
>
> Regards,
> Gourav Sengupta
>