[ https://issues.apache.org/jira/browse/SPARK-53330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-53330: ------------------------------------ Assignee: Ben Hurdelhey > Fix Arrow UDF support for DayTimeIntervalType with bounds != start-end > ---------------------------------------------------------------------- > > Key: SPARK-53330 > URL: https://issues.apache.org/jira/browse/SPARK-53330 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 4.1.0 > Reporter: Ben Hurdelhey > Assignee: Ben Hurdelhey > Priority: Major > Labels: pull-request-available > > When a pyspark udf (useArrow=true) returns interval type data, it currently > errors with below error when the resultType (e.g., DayTimeIntervalType) has > begin/end that don't span the maximum range. > org.apache.spark.SparkException: [ARROW_TYPE_MISMATCH] Invalid schema from > pandas_udf(): expected DayTimeIntervalType(1,3), got > DayTimeIntervalType(0,3). SQLSTATE: 42K0G > > Repro: > ``` > from pyspark.sql.types import DayTimeIntervalType > from pyspark.sql.functions import udf > > @udf(useArrow=True, returnType=DayTimeIntervalType(0, 3)) # this works > def return_interval1(x): > return x > @udf(useArrow=True, returnType=DayTimeIntervalType(1, 3)) # this fails > def return_interval2(x): > return x > spark.sql("SELECT INTERVAL '1 10:30:45.123' DAY TO SECOND as > value").select(return_interval2("value")).collect() > ``` > > YearToMonthIntervalType is not supported in arrow udfs, so that it's > currently not a concern. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org