[ https://issues.apache.org/jira/browse/ARROW-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan updated ARROW-6382: ----------------------- Description: When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior? If so, what is the rationale. If not, how do I fix this? Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5. To reproduce: {code:java} import pandas as pd from pyspark.sql import SparkSession from pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate() # setting this to false will allow the exception to be caught spark.conf.set("spark.sql.execution.arrow.enabled", "true") @udfdef disrupt: raise Exception("Test EXCEPTION") data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]})) try: test = data.withColumn("test", disrupt("A")).toPandas() except: print("exception caught") print('end'){code} I would hope there's a way to catch the exception with the general except clause. was: When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior? If so, what is the rationale. If not, how do I fix this? Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5. To reproduce: {{import pandas as pdfrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate()# setting this to false will allow the exception to be caughtspark.conf.set("spark.sql.execution.arrow.enabled", "true")@udfdef disrupt(x):raise Exception("Test EXCEPTION")data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))try: test = data.withColumn("test", disrupt("A")).toPandas()except:print("exception caught")print('end')}} I would hope there's a way to catch the exception with the general except clause. > Unable to catch Python UDF exceptions when using PyArrow > -------------------------------------------------------- > > Key: ARROW-6382 > URL: https://issues.apache.org/jira/browse/ARROW-6382 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.1 > Environment: Ubuntu 18.04 > Reporter: Jan > Priority: Minor > > When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become > impossible to catch: see example below. Is this expected behavior? > If so, what is the rationale. If not, how do I fix this? > Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and > 2.4.3. Python 3.6.5. > To reproduce: > {code:java} > import pandas as pd > from pyspark.sql import SparkSession > from pyspark.sql.functions import udf > spark = SparkSession.builder.getOrCreate() > # setting this to false will allow the exception to be caught > spark.conf.set("spark.sql.execution.arrow.enabled", "true") > @udfdef disrupt: > raise Exception("Test EXCEPTION") > data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]})) > try: > test = data.withColumn("test", disrupt("A")).toPandas() > except: > print("exception caught") > print('end'){code} > I would hope there's a way to catch the exception with the general except > clause. > -- This message was sent by Atlassian Jira (v8.3.2#803003)