Brad Willard created SPARK-10488:
------------------------------------

             Summary: No longer possible to create SparkConf in pyspark 
application
                 Key: SPARK-10488
                 URL: https://issues.apache.org/jira/browse/SPARK-10488
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.4.1, 1.4.0
         Environment: pyspark on ec2 deployed cluster
            Reporter: Brad Willard


I used to be able to make SparkContext connections directly in ipython 
notebooks so that each notebook could have different resources on the cluster. 
This worked perfectly until spark 1.4.x.

This code worked on all previous version of spark and no longer works

{code}
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext

cpus = 15
ram = 5

conf = SparkConf().set('spark.executor.memory', '%sg' % 
ram).set('spark.cores.max', str(cpus))

cluster_url = 'spark://%s:7077' % master

job_name = 'test'
sc = SparkContext(cluster_url, job_name, conf=confg)
{code}

It errors on the SparkConf() line because you can't even make that object in 
python now without the SparkContext already created....which makes no sense to 
me.

{code}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-4-453520c03f2b> in <module>()
      5 ram = 5
      6 
----> 7 conf = SparkConf().set('spark.executor.memory', '%sg' % 
ram).set('spark.cores.max', str(cpus))
      8 
      9 cluster_url = 'spark://%s:7077' % master

/root/spark/python/pyspark/conf.py in __init__(self, loadDefaults, _jvm, _jconf)
    102         else:
    103             from pyspark.context import SparkContext
--> 104             SparkContext._ensure_initialized()
    105             _jvm = _jvm or SparkContext._jvm
    106             self._jconf = _jvm.SparkConf(loadDefaults)

/root/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, 
gateway)
    227         with SparkContext._lock:
    228             if not SparkContext._gateway:
--> 229                 SparkContext._gateway = gateway or launch_gateway()
    230                 SparkContext._jvm = SparkContext._gateway.jvm
    231 

/root/spark/python/pyspark/java_gateway.py in launch_gateway()
     87                 callback_socket.close()
     88         if gateway_port is None:
---> 89             raise Exception("Java gateway process exited before sending 
the driver its port number")
     90 
     91         # In Windows, ensure the Java child processes do not linger 
after Python has exited.

Exception: Java gateway process exited before sending the driver its port number
{code}

I am able to work by setting all the pyspark environmental ipython notebook 
variables, but then each notebook is forced to have the same resources which 
isn't great if you run lots of different types of jobs ad hoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to