[ 
https://issues.apache.org/jira/browse/SPARK-33339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated SPARK-33339:
------------------------
    Description: 
When a system.exit exception occurs during the process, the python worker exits 
abnormally, and then the executor task is still waiting for the worker for 
reading from socket, causing it to hang.
The system.exit exception may be caused by the user's error code, but spark 
should at least throw an error to remind the user, not get stuck
we can run a simple test to reproduce this case:

```

from pyspark.sql import SparkSession

def err(line):

  raise SystemExit

spark = SparkSession.builder.appName("test").getOrCreate()

spark.sparkContext.parallelize(range(1,2), 2).map(err).collect()

spark.stop()

``` 
 

  was:
at pyspark application, worker don't catch BaseException, then once worker call 
system.exit because of some error, the application will hangup, and will not 
throw any exception, then the fail cause is not easy to find.

for example,  run `spark-submit --master yarn-client test.py`,  this command 
will hangup without any information. The test.py content:

```

from pyspark.sql import SparkSession

def err(line):

  raise SystemExit

spark = SparkSession.builder.appName("test").getOrCreate()

spark.sparkContext.parallelize(range(1,2), 2).map(err).collect()

spark.stop()

``` 

        Summary: Pyspark application will hang due to non Exception  (was: 
pyspark application maybe hangup because of worker exit)

> Pyspark application will hang due to non Exception
> --------------------------------------------------
>
>                 Key: SPARK-33339
>                 URL: https://issues.apache.org/jira/browse/SPARK-33339
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.5, 3.0.0, 3.0.1
>            Reporter: lrz
>            Priority: Major
>
> When a system.exit exception occurs during the process, the python worker 
> exits abnormally, and then the executor task is still waiting for the worker 
> for reading from socket, causing it to hang.
> The system.exit exception may be caused by the user's error code, but spark 
> should at least throw an error to remind the user, not get stuck
> we can run a simple test to reproduce this case:
> ```
> from pyspark.sql import SparkSession
> def err(line):
>   raise SystemExit
> spark = SparkSession.builder.appName("test").getOrCreate()
> spark.sparkContext.parallelize(range(1,2), 2).map(err).collect()
> spark.stop()
> ``` 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to