Brad Willard created SPARK-10488: ------------------------------------ Summary: No longer possible to create SparkConf in pyspark application Key: SPARK-10488 URL: https://issues.apache.org/jira/browse/SPARK-10488 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1, 1.4.0 Environment: pyspark on ec2 deployed cluster Reporter: Brad Willard
I used to be able to make SparkContext connections directly in ipython notebooks so that each notebook could have different resources on the cluster. This worked perfectly until spark 1.4.x. This code worked on all previous version of spark and no longer works {code} from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext cpus = 15 ram = 5 conf = SparkConf().set('spark.executor.memory', '%sg' % ram).set('spark.cores.max', str(cpus)) cluster_url = 'spark://%s:7077' % master job_name = 'test' sc = SparkContext(cluster_url, job_name, conf=confg) {code} It errors on the SparkConf() line because you can't even make that object in python now without the SparkContext already created....which makes no sense to me. {code} --------------------------------------------------------------------------- Exception Traceback (most recent call last) <ipython-input-4-453520c03f2b> in <module>() 5 ram = 5 6 ----> 7 conf = SparkConf().set('spark.executor.memory', '%sg' % ram).set('spark.cores.max', str(cpus)) 8 9 cluster_url = 'spark://%s:7077' % master /root/spark/python/pyspark/conf.py in __init__(self, loadDefaults, _jvm, _jconf) 102 else: 103 from pyspark.context import SparkContext --> 104 SparkContext._ensure_initialized() 105 _jvm = _jvm or SparkContext._jvm 106 self._jconf = _jvm.SparkConf(loadDefaults) /root/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway) 227 with SparkContext._lock: 228 if not SparkContext._gateway: --> 229 SparkContext._gateway = gateway or launch_gateway() 230 SparkContext._jvm = SparkContext._gateway.jvm 231 /root/spark/python/pyspark/java_gateway.py in launch_gateway() 87 callback_socket.close() 88 if gateway_port is None: ---> 89 raise Exception("Java gateway process exited before sending the driver its port number") 90 91 # In Windows, ensure the Java child processes do not linger after Python has exited. Exception: Java gateway process exited before sending the driver its port number {code} I am able to work by setting all the pyspark environmental ipython notebook variables, but then each notebook is forced to have the same resources which isn't great if you run lots of different types of jobs ad hoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org