[ 
https://issues.apache.org/jira/browse/SPARK-53330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-53330:
------------------------------------

    Assignee: Ben Hurdelhey

> Fix Arrow UDF support for DayTimeIntervalType with bounds != start-end
> ----------------------------------------------------------------------
>
>                 Key: SPARK-53330
>                 URL: https://issues.apache.org/jira/browse/SPARK-53330
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 4.1.0
>            Reporter: Ben Hurdelhey
>            Assignee: Ben Hurdelhey
>            Priority: Major
>              Labels: pull-request-available
>
> When a pyspark udf (useArrow=true) returns interval type data, it currently 
> errors with below error when the resultType (e.g., DayTimeIntervalType) has 
> begin/end that don't span the maximum range.  
> org.apache.spark.SparkException: [ARROW_TYPE_MISMATCH] Invalid schema from 
> pandas_udf(): expected DayTimeIntervalType(1,3), got 
> DayTimeIntervalType(0,3). SQLSTATE: 42K0G
>  
> Repro:
> ```
> from pyspark.sql.types import DayTimeIntervalType
> from pyspark.sql.functions import udf
>  
> @udf(useArrow=True, returnType=DayTimeIntervalType(0, 3)) # this works
> def return_interval1(x):
>   return x
> @udf(useArrow=True, returnType=DayTimeIntervalType(1, 3)) # this fails
> def return_interval2(x):
>   return x
> spark.sql("SELECT INTERVAL '1 10:30:45.123' DAY TO SECOND as 
> value").select(return_interval2("value")).collect()
> ```
>  
> YearToMonthIntervalType is not supported in arrow udfs, so that it's 
> currently not a concern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to