[ 
https://issues.apache.org/jira/browse/SPARK-53212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-53212:
-----------------------------------
    Labels: pull-request-available  (was: )

> Improve error handling for Scalar Pandas UDF returnType mismatches
> ------------------------------------------------------------------
>
>                 Key: SPARK-53212
>                 URL: https://issues.apache.org/jira/browse/SPARK-53212
>             Project: Spark
>          Issue Type: Documentation
>          Components: Pandas API on Spark
>    Affects Versions: 4.1.0, 4.0.0, kubernetes-operator-0.4.0
>            Reporter: Bill Schneider
>            Priority: Major
>              Labels: pull-request-available
>
> Continuation of SPARK-40770 for scalar UDFs that return pd.DataFrame
> Improved error handling did not apply there. Example use case:
> {code:java}
> @pandas_udf(returnType=StructType([
>     StructField("foo", DoubleType()),
>     StructField("foo", DoubleType())]))
> def calculate_udf_fail(input: pd.DataFrame) -> pd.DataFrame:
>     return pd.DataFrame({
>         "foo": input['foo']
>    })
> spark = SparkSession.builder \
>     .appName("Test UDF") \
>     .getOrCreate()
> spark.sql("select 1 as 
> foo").select(calculate_udf_fail(struct("foo")).alias("result")).show() {code}
> Repeating the same column twice in the returnType's `StructType` causes this 
> cryptic error
> java.lang.IllegalArgumentException: not all nodes, buffers and 
> variadicBufferCounts were consumed. nodes: [ArrowFieldNode [length=1, 
> nullCount=0]] buffers: [ArrowBuf[10], address:140498365661192, capacity:0, 
> ArrowBuf[11], address:140498365661192, capacity:8] variadicBufferCounts: []
>  
> I also still see "KeyError: foo" instead of the improved "missing field" 
> error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to