[jira] [Updated] (SPARK-21881) Again: OOM killer may leave SparkContext in broken state causing Connection Refused errors

Kai Londenberg (JIRA) Thu, 31 Aug 2017 00:16:08 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kai Londenberg updated SPARK-21881:
-----------------------------------
    Affects Version/s:     (was: 1.6.1)
                           (was: 2.0.0)

> Again: OOM killer may leave SparkContext in broken state causing Connection 
> Refused errors
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21881
>                 URL: https://issues.apache.org/jira/browse/SPARK-21881
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>            Reporter: Kai Londenberg
>            Assignee: Alexander Shorin
>
> This is a duplicate of SPARK-18523, which was not really fixed for me 
> (PySpark 2.2.0, Python 3.5, py4j 0.10.4 )
> *Original Summary:*
> When you run some memory-heavy spark job, Spark driver may consume more 
> memory resources than host available to provide.
> In this case OOM killer comes on scene and successfully kills a spark-submit 
> process.
> The pyspark.SparkContext is not able to handle such state of things and 
> becomes completely broken. 
> You cannot stop it as on stop it tries to call stop method of bounded java 
> context (jsc) and fails with Py4JError, because such process no longer exists 
> as like as the connection to it. 
> You cannot start new SparkContext because you have your broken one as active 
> one and pyspark still is not able to not have SparkContext as sort of 
> singleton.
> The only thing you can do is shutdown your IPython Notebook and start it 
> over. Or dive into SparkContext internal attributes and reset them manually 
> to initial None state.
> The OOM killer case is just one of the many: any reason of spark-submit crash 
> in the middle of something leaves SparkContext in a broken state.
> *Latest Comment*
> In PySpark 2.2.0 this issue was not really fixed. While I could close the 
> SparkContext (with an Exception message, but it was closed afterwards), I 
> could not reopen any new spark contexts.
> *Current Workaround*
> If I resetted the global SparkContext variables like this, it worked :
> {code:none}
> def reset_spark():
>     import pyspark
>     from threading import RLock
>     pyspark.SparkContext._jvm = None
>     pyspark.SparkContext._gateway = None
>     pyspark.SparkContext._next_accum_id = 0
>     pyspark.SparkContext._active_spark_context = None
>     pyspark.SparkContext._lock = RLock()
>     pyspark.SparkContext._python_includes = None
> reset_spark()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-21881) Again: OOM killer may leave SparkContext in broken state causing Connection Refused errors

Reply via email to