[jira] [Comment Edited] (SPARK-10488) No longer possible to create SparkConf in pyspark application

Brad Willard (JIRA) Tue, 08 Sep 2015 09:56:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735141#comment-14735141
 ]


Brad Willard edited comment on SPARK-10488 at 9/8/15 4:54 PM:
--------------------------------------------------------------

[~srowen] I have it working via that method at the moment. It's just annoying 
because it forces each notebook to use the same amount of resources on the 
cluster whereas I was able to configure that through SparkConf on all previous 
versions of spark with the above code. 

My usecase is that I have some notebooks that are doing a deep historical job 
on 4 billion rows that require the entire cluster and would request those 
resources, however other notebooks would look at smaller datasets (1-5 million) 
that require only 1/10th of the cluster. I really dislike that I've lost that 
configurability now.


was (Author: brdwrd):
[~srowen] I have it working via that method at the moment. It's just annoying 
because it forces each notebook to use the same amount of resources on the 
cluster whereas I was able to configure that through SparkConf on all previous 
versions of spark with the above code. So I have some notebook that are doing a 
deep historical job on 4 billino rows that require the entire cluster and would 
request those resources, however other notebooks would look at smaller datasets 
(1-5 million) that require only 1/10th of the cluster. I really dislike that 
I've lost that configurability now.

> No longer possible to create SparkConf in pyspark application
> -------------------------------------------------------------
>
>                 Key: SPARK-10488
>                 URL: https://issues.apache.org/jira/browse/SPARK-10488
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.0, 1.4.1
>         Environment: pyspark on ec2 deployed cluster
>            Reporter: Brad Willard
>
> I used to be able to make SparkContext connections directly in ipython 
> notebooks so that each notebook could have different resources on the 
> cluster. This worked perfectly until spark 1.4.x.
> This code worked on all previous version of spark and no longer works
> {code}
> from pyspark import SparkConf, SparkContext
> from pyspark.sql import SQLContext
> cpus = 15
> ram = 5
> conf = SparkConf().set('spark.executor.memory', '%sg' % 
> ram).set('spark.cores.max', str(cpus))
> cluster_url = 'spark://%s:7077' % master
> job_name = 'test'
> sc = SparkContext(cluster_url, job_name, conf=confg)
> {code}
> It errors on the SparkConf() line because you can't even make that object in 
> python now without the SparkContext already created....which makes no sense 
> to me.
> {code}
> ---------------------------------------------------------------------------
> Exception                                 Traceback (most recent call last)
> <ipython-input-4-453520c03f2b> in <module>()
>       5 ram = 5
>       6 
> ----> 7 conf = SparkConf().set('spark.executor.memory', '%sg' % 
> ram).set('spark.cores.max', str(cpus))
>       8 
>       9 cluster_url = 'spark://%s:7077' % master
> /root/spark/python/pyspark/conf.py in __init__(self, loadDefaults, _jvm, 
> _jconf)
>     102         else:
>     103             from pyspark.context import SparkContext
> --> 104             SparkContext._ensure_initialized()
>     105             _jvm = _jvm or SparkContext._jvm
>     106             self._jconf = _jvm.SparkConf(loadDefaults)
> /root/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, 
> gateway)
>     227         with SparkContext._lock:
>     228             if not SparkContext._gateway:
> --> 229                 SparkContext._gateway = gateway or launch_gateway()
>     230                 SparkContext._jvm = SparkContext._gateway.jvm
>     231 
> /root/spark/python/pyspark/java_gateway.py in launch_gateway()
>      87                 callback_socket.close()
>      88         if gateway_port is None:
> ---> 89             raise Exception("Java gateway process exited before 
> sending the driver its port number")
>      90 
>      91         # In Windows, ensure the Java child processes do not linger 
> after Python has exited.
> Exception: Java gateway process exited before sending the driver its port 
> number
> {code}
> I am able to work by setting all the pyspark environmental ipython notebook 
> variables, but then each notebook is forced to have the same resources which 
> isn't great if you run lots of different types of jobs ad hoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10488) No longer possible to create SparkConf in pyspark application

Reply via email to