[jira] [Commented] (SPARK-16262) Impossible to remake new SparkContext using SparkSession API in Pyspark

Vladimir Feinberg (JIRA) Tue, 28 Jun 2016 13:20:06 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353642#comment-15353642
 ]


Vladimir Feinberg commented on SPARK-16262:
-------------------------------------------

Ah, are you suggesting that line should be inside of {{SparkSession.stop()}}? 
I'm totally OK with that, but then that's a fix for this bug, right? As in, 
your comment wasn't a contention you had with the JIRA itself?

> Impossible to remake new SparkContext using SparkSession API in Pyspark
> -----------------------------------------------------------------------
>
>                 Key: SPARK-16262
>                 URL: https://issues.apache.org/jira/browse/SPARK-16262
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Vladimir Feinberg
>            Priority: Minor
>
> There are multiple use cases where one might like to be able to stop and 
> re-start a {{SparkSession}}: configuration changes or modular testing. The 
> following code demonstrates that without clearing a hidden global 
> {{SparkSession._instantiatedContext = None}} it is impossible to re-create a 
> new Spark session after stopping one in the same process:
> {code}
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.getOrCreate()
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/06/28 11:28:10 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/06/28 11:28:10 WARN Utils: Your hostname, vlad-databricks resolves to a 
> loopback address: 127.0.1.1; using 192.168.3.166 instead (on interface 
> enp0s31f6)
> 16/06/28 11:28:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> >>> spark.stop()
> >>> spark = SparkSession.builder.getOrCreate()
> >>> spark.createDataFrame([(1,)])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyspark/sql/session.py", line 514, in createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "pyspark/sql/session.py", line 394, in _createFromLocal
>     return self._sc.parallelize(data), schema
>   File "pyspark/context.py", line 410, in parallelize
>     numSlices = int(numSlices) if numSlices is not None else 
> self.defaultParallelism
>   File "pyspark/context.py", line 346, in defaultParallelism
>     return self._jsc.sc().defaultParallelism()
> AttributeError: 'NoneType' object has no attribute 'sc'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16262) Impossible to remake new SparkContext using SparkSession API in Pyspark

Reply via email to