[
https://issues.apache.org/jira/browse/SPARK-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-10488.
-------------------------------
Resolution: Not A Problem
I *think* this is not intended to work, or at least it's enforced more
strongly, that you can't make multiple contexts. There's a workaround. This can
be reopened if there's reason to believe that this should actually work but
AFAIK it's one context per process.
> No longer possible to create SparkConf in pyspark application
> -------------------------------------------------------------
>
> Key: SPARK-10488
> URL: https://issues.apache.org/jira/browse/SPARK-10488
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.4.0, 1.4.1
> Environment: pyspark on ec2 deployed cluster
> Reporter: Brad Willard
>
> I used to be able to make SparkContext connections directly in ipython
> notebooks so that each notebook could have different resources on the
> cluster. This worked perfectly until spark 1.4.x.
> This code worked on all previous version of spark and no longer works
> {code}
> from pyspark import SparkConf, SparkContext
> from pyspark.sql import SQLContext
> cpus = 15
> ram = 5
> conf = SparkConf().set('spark.executor.memory', '%sg' %
> ram).set('spark.cores.max', str(cpus))
> cluster_url = 'spark://%s:7077' % master
> job_name = 'test'
> sc = SparkContext(cluster_url, job_name, conf=confg)
> {code}
> It errors on the SparkConf() line because you can't even make that object in
> python now without the SparkContext already created....which makes no sense
> to me.
> {code}
> ---------------------------------------------------------------------------
> Exception Traceback (most recent call last)
> <ipython-input-4-453520c03f2b> in <module>()
> 5 ram = 5
> 6
> ----> 7 conf = SparkConf().set('spark.executor.memory', '%sg' %
> ram).set('spark.cores.max', str(cpus))
> 8
> 9 cluster_url = 'spark://%s:7077' % master
> /root/spark/python/pyspark/conf.py in __init__(self, loadDefaults, _jvm,
> _jconf)
> 102 else:
> 103 from pyspark.context import SparkContext
> --> 104 SparkContext._ensure_initialized()
> 105 _jvm = _jvm or SparkContext._jvm
> 106 self._jconf = _jvm.SparkConf(loadDefaults)
> /root/spark/python/pyspark/context.py in _ensure_initialized(cls, instance,
> gateway)
> 227 with SparkContext._lock:
> 228 if not SparkContext._gateway:
> --> 229 SparkContext._gateway = gateway or launch_gateway()
> 230 SparkContext._jvm = SparkContext._gateway.jvm
> 231
> /root/spark/python/pyspark/java_gateway.py in launch_gateway()
> 87 callback_socket.close()
> 88 if gateway_port is None:
> ---> 89 raise Exception("Java gateway process exited before
> sending the driver its port number")
> 90
> 91 # In Windows, ensure the Java child processes do not linger
> after Python has exited.
> Exception: Java gateway process exited before sending the driver its port
> number
> {code}
> I am able to work by setting all the pyspark environmental ipython notebook
> variables, but then each notebook is forced to have the same resources which
> isn't great if you run lots of different types of jobs ad hoc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]