[
https://issues.apache.org/jira/browse/SPARK-48711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-48711:
-----------------------------------
Labels: pull-request-available (was: )
> OOM killer may leave SparkContext in broken state causing
> ConnectionRefusedError
> --------------------------------------------------------------------------------
>
> Key: SPARK-48711
> URL: https://issues.apache.org/jira/browse/SPARK-48711
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Spark Core
> Affects Versions: 3.5.0
> Reporter: Rafal Wojdyla
> Priority: Major
> Labels: pull-request-available
>
> Related to https://issues.apache.org/jira/browse/SPARK-18523, and
> https://github.com/apache/spark/pull/15961. I'm currently on:
> {code}
> pyspark 3.5.0 pyhd8ed1ab_0 conda-forge
> py4j 0.10.9.7 pyhd8ed1ab_0 conda-forge
> {code}
> When Spark JVM process gets OOM-Killed, `SparkContext.stop` fails with
> `ConnectionRefusedError`, which leaves the `SparkSession/Context` in a
> "dirty" state. https://issues.apache.org/jira/browse/SPARK-18523 addressed
> this by catching the {{Py4JError}} it looks like the code now raises
> {{ConnectionRefusedError}}:
> {code}
> Traceback (most recent call last):
> ...
> File "<TRUNC>/lib/python3.11/site-packages/pyspark/sql/session.py", line
> 1796, in stop
> self._sc.stop()
> File "<TRUNC>/lib/python3.11/site-packages/pyspark/context.py", line 654,
> in stop
> self._jsc.stop()
> File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line
> 1321, in __call__
> answer = self.gateway_client.send_command(command)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line
> 1036, in send_command
> connection = self._get_connection()
> ^^^^^^^^^^^^^^^^^^^^^^
> File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 284,
> in _get_connection
> connection = self._create_new_connection()
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 291,
> in _create_new_connection
> connection.connect_to_java_server()
> File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 438,
> in connect_to_java_server
> self.socket.connect((self.java_address, self.java_port))
> ConnectionRefusedError: [Errno 111] Connection refused
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]