Lei (Eddy) Xu created SPARK-34600:
-------------------------------------

             Summary: Support user defined types in Pandas UDF
                 Key: SPARK-34600
                 URL: https://issues.apache.org/jira/browse/SPARK-34600
             Project: Spark
          Issue Type: New Feature
          Components: PySpark, SQL
    Affects Versions: 3.1.1, 3.0.2
            Reporter: Lei (Eddy) Xu


Because Pandas UDF uses pyarrow to passing data, it does not currently support 
UserDefinedTypes, as what normal python udf does.

For example:


{code:python}
class BoxType(UserDefinedType):
    @classmethod
    def sqlType(cls) -> StructType:
        return StructType(
            fields=[
                StructField("xmin", DoubleType(), False),
                StructField("ymin", DoubleType(), False),
                StructField("xmax", DoubleType(), False),
                StructField("ymax", DoubleType(), False),
            ]
        )

         @pandas_udf(
        returnType=StructType([StructField("boxes", ArrayType(Box()))]
    )
    def pandas_pf(s: pd.DataFrame) -> pd.DataFrame:
       yield s
{code}

The logs indicate t
{quote}
try:
                to_arrow_type(self._returnType_placeholder)
            except TypeError:
>               raise NotImplementedError(
                    "Invalid return type with scalar Pandas UDFs: %s is "
E                   NotImplementedError: Invalid return type with scalar Pandas 
UDFs: StructType(List(StructField(boxes,ArrayType(Box,true),true))) is not 
supported
{quote}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to