[ 
https://issues.apache.org/jira/browse/SPARK-48711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48711:
-----------------------------------
    Labels: pull-request-available  (was: )

> OOM killer may leave SparkContext in broken state causing 
> ConnectionRefusedError
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-48711
>                 URL: https://issues.apache.org/jira/browse/SPARK-48711
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 3.5.0
>            Reporter: Rafal Wojdyla
>            Priority: Major
>              Labels: pull-request-available
>
> Related to https://issues.apache.org/jira/browse/SPARK-18523, and 
> https://github.com/apache/spark/pull/15961. I'm currently on:
> {code}
> pyspark                   3.5.0              pyhd8ed1ab_0    conda-forge
> py4j                      0.10.9.7           pyhd8ed1ab_0    conda-forge
> {code}
> When Spark JVM process gets OOM-Killed, `SparkContext.stop` fails with 
> `ConnectionRefusedError`, which leaves the `SparkSession/Context` in a 
> "dirty" state. https://issues.apache.org/jira/browse/SPARK-18523 addressed 
> this by catching the {{Py4JError}} it looks like the code now raises 
> {{ConnectionRefusedError}}:
> {code}
> Traceback (most recent call last):
>   ...
>   File "<TRUNC>/lib/python3.11/site-packages/pyspark/sql/session.py", line 
> 1796, in stop
>     self._sc.stop()
>   File "<TRUNC>/lib/python3.11/site-packages/pyspark/context.py", line 654, 
> in stop
>     self._jsc.stop()
>   File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 
> 1321, in __call__
>     answer = self.gateway_client.send_command(command)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 
> 1036, in send_command
>     connection = self._get_connection()
>                  ^^^^^^^^^^^^^^^^^^^^^^
>   File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 284, 
> in _get_connection
>     connection = self._create_new_connection()
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 291, 
> in _create_new_connection
>     connection.connect_to_java_server()
>   File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 438, 
> in connect_to_java_server
>     self.socket.connect((self.java_address, self.java_port))
> ConnectionRefusedError: [Errno 111] Connection refused
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to