[ https://issues.apache.org/jira/browse/SPARK-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21881. ------------------------------- Resolution: Duplicate Let's not fork the discussion. Post on the original issue and someone can reopen if they believe it's valid. All the better if you have a pull request. > Again: OOM killer may leave SparkContext in broken state causing Connection > Refused errors > ------------------------------------------------------------------------------------------ > > Key: SPARK-21881 > URL: https://issues.apache.org/jira/browse/SPARK-21881 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.2.0 > Reporter: Kai Londenberg > Assignee: Alexander Shorin > > This is a duplicate of SPARK-18523, which was not really fixed for me > (PySpark 2.2.0, Python 3.5, py4j 0.10.4 ) > *Original Summary:* > When you run some memory-heavy spark job, Spark driver may consume more > memory resources than host available to provide. > In this case OOM killer comes on scene and successfully kills a spark-submit > process. > The pyspark.SparkContext is not able to handle such state of things and > becomes completely broken. > You cannot stop it as on stop it tries to call stop method of bounded java > context (jsc) and fails with Py4JError, because such process no longer exists > as like as the connection to it. > You cannot start new SparkContext because you have your broken one as active > one and pyspark still is not able to not have SparkContext as sort of > singleton. > The only thing you can do is shutdown your IPython Notebook and start it > over. Or dive into SparkContext internal attributes and reset them manually > to initial None state. > The OOM killer case is just one of the many: any reason of spark-submit crash > in the middle of something leaves SparkContext in a broken state. > *Latest Comment* > In PySpark 2.2.0 this issue was not really fixed. While I could close the > SparkContext (with an Exception message, but it was closed afterwards), I > could not reopen any new spark contexts. > *Current Workaround* > If I resetted the global SparkContext variables like this, it worked : > {code:none} > def reset_spark(): > import pyspark > from threading import RLock > pyspark.SparkContext._jvm = None > pyspark.SparkContext._gateway = None > pyspark.SparkContext._next_accum_id = 0 > pyspark.SparkContext._active_spark_context = None > pyspark.SparkContext._lock = RLock() > pyspark.SparkContext._python_includes = None > reset_spark() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org