[ 
https://issues.apache.org/jira/browse/ARROW-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan updated ARROW-6382:
-----------------------
    Description: 
When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become 
impossible to catch: see example below. Is this expected behavior?

If so, what is the rationale. If not, how do I fix this?

Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 
2.4.3. Python 3.6.5.

To reproduce:
{code:java}
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

# setting this to false will allow the exception to be caught
spark.conf.set("spark.sql.execution.arrow.enabled", "true")

@udfdef disrupt:
    raise Exception("Test EXCEPTION")

data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
try: 
    test = data.withColumn("test", disrupt("A")).toPandas()
except:
    print("exception caught")

print('end'){code}
I would hope there's a way to catch the exception with the general except 
clause.

 

  was:
When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become 
impossible to catch: see example below. Is this expected behavior?

If so, what is the rationale. If not, how do I fix this?

Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 
2.4.3. Python 3.6.5.

To reproduce:

{{import pandas as pdfrom pyspark.sql import SparkSessionfrom 
pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()# setting this to false will allow 
the exception to be caughtspark.conf.set("spark.sql.execution.arrow.enabled", 
"true")@udfdef disrupt(x):raise Exception("Test EXCEPTION")data = 
spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))try:    test = 
data.withColumn("test", disrupt("A")).toPandas()except:print("exception 
caught")print('end')}}

I would hope there's a way to catch the exception with the general except 
clause.

 


> Unable to catch Python UDF exceptions when using PyArrow
> --------------------------------------------------------
>
>                 Key: ARROW-6382
>                 URL: https://issues.apache.org/jira/browse/ARROW-6382
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>         Environment: Ubuntu 18.04
>            Reporter: Jan
>            Priority: Minor
>
> When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become 
> impossible to catch: see example below. Is this expected behavior?
> If so, what is the rationale. If not, how do I fix this?
> Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 
> 2.4.3. Python 3.6.5.
> To reproduce:
> {code:java}
> import pandas as pd
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> # setting this to false will allow the exception to be caught
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> @udfdef disrupt:
>     raise Exception("Test EXCEPTION")
> data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
> try: 
>     test = data.withColumn("test", disrupt("A")).toPandas()
> except:
>     print("exception caught")
> print('end'){code}
> I would hope there's a way to catch the exception with the general except 
> clause.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to