[
https://issues.apache.org/jira/browse/SPARK-18523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-18523.
---------------------------------
Resolution: Fixed
Assignee: Alexander Shorin
Fix Version/s: 2.1.0
> OOM killer may leave SparkContext in broken state causing Connection Refused
> errors
> -----------------------------------------------------------------------------------
>
> Key: SPARK-18523
> URL: https://issues.apache.org/jira/browse/SPARK-18523
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.6.1, 2.0.0
> Reporter: Alexander Shorin
> Assignee: Alexander Shorin
> Fix For: 2.1.0
>
>
> When you run some memory-heavy spark job, Spark driver may consume more
> memory resources than host available to provide.
> In this case OOM killer comes on scene and successfully kills a spark-submit
> process.
> The pyspark.SparkContext is not able to handle such state of things and
> becomes completely broken.
> You cannot stop it as on stop it tries to call stop method of bounded java
> context (jsc) and fails with Py4JError, because such process no longer exists
> as like as the connection to it.
> You cannot start new SparkContext because you have your broken one as active
> one and pyspark still is not able to not have SparkContext as sort of
> singleton.
> The only thing you can do is shutdown your IPython Notebook and start it
> over. Or dive into SparkContext internal attributes and reset them manually
> to initial None state.
> The OOM killer case is just one of the many: any reason of spark-submit crash
> in the middle of something leaves SparkContext in a broken state.
> Example on error log on {{sc.stop()}} in broken state:
> {code}
> ERROR:root:Exception while sending command.
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
> 883, in send_command
> response = connection.send_command(command)
> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
> 1040, in send_command
> "Error while receiving", e, proto.ERROR_ON_RECEIVE)
> Py4JNetworkError: Error while receiving
> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java
> server (127.0.0.1:59911)
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
> 963, in start
> self.socket.connect((self.address, self.port))
> File "/usr/local/lib/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 61] Connection refused
> ---------------------------------------------------------------------------
> Py4JError Traceback (most recent call last)
> <ipython-input-2-f154e069615b> in <module>()
> ----> 1 sc.stop()
> /usr/local/share/spark/python/pyspark/context.py in stop(self)
> 360 """
> 361 if getattr(self, "_jsc", None):
> --> 362 self._jsc.stop()
> 363 self._jsc = None
> 364 if getattr(self, "_accumulatorServer", None):
> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in
> __call__(self, *args)
> 1131 answer = self.gateway_client.send_command(command)
> 1132 return_value = get_return_value(
> -> 1133 answer, self.gateway_client, self.target_id, self.name)
> 1134
> 1135 for temp_arg in temp_args:
> /usr/local/share/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
> 43 def deco(*a, **kw):
> 44 try:
> ---> 45 return f(*a, **kw)
> 46 except py4j.protocol.Py4JJavaError as e:
> 47 s = e.java_exception.toString()
> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in
> get_return_value(answer, gateway_client, target_id, name)
> 325 raise Py4JError(
> 326 "An error occurred while calling {0}{1}{2}".
> --> 327 format(target_id, ".", name))
> 328 else:
> 329 type = answer[1]
> Py4JError: An error occurred while calling o47.stop
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]