[
https://issues.apache.org/jira/browse/SPARK-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kai Londenberg updated SPARK-21881:
-----------------------------------
Affects Version/s: (was: 1.6.1)
(was: 2.0.0)
> Again: OOM killer may leave SparkContext in broken state causing Connection
> Refused errors
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-21881
> URL: https://issues.apache.org/jira/browse/SPARK-21881
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.2.0
> Reporter: Kai Londenberg
> Assignee: Alexander Shorin
>
> This is a duplicate of SPARK-18523, which was not really fixed for me
> (PySpark 2.2.0, Python 3.5, py4j 0.10.4 )
> *Original Summary:*
> When you run some memory-heavy spark job, Spark driver may consume more
> memory resources than host available to provide.
> In this case OOM killer comes on scene and successfully kills a spark-submit
> process.
> The pyspark.SparkContext is not able to handle such state of things and
> becomes completely broken.
> You cannot stop it as on stop it tries to call stop method of bounded java
> context (jsc) and fails with Py4JError, because such process no longer exists
> as like as the connection to it.
> You cannot start new SparkContext because you have your broken one as active
> one and pyspark still is not able to not have SparkContext as sort of
> singleton.
> The only thing you can do is shutdown your IPython Notebook and start it
> over. Or dive into SparkContext internal attributes and reset them manually
> to initial None state.
> The OOM killer case is just one of the many: any reason of spark-submit crash
> in the middle of something leaves SparkContext in a broken state.
> *Latest Comment*
> In PySpark 2.2.0 this issue was not really fixed. While I could close the
> SparkContext (with an Exception message, but it was closed afterwards), I
> could not reopen any new spark contexts.
> *Current Workaround*
> If I resetted the global SparkContext variables like this, it worked :
> {code:none}
> def reset_spark():
> import pyspark
> from threading import RLock
> pyspark.SparkContext._jvm = None
> pyspark.SparkContext._gateway = None
> pyspark.SparkContext._next_accum_id = 0
> pyspark.SparkContext._active_spark_context = None
> pyspark.SparkContext._lock = RLock()
> pyspark.SparkContext._python_includes = None
> reset_spark()
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]