[ 
https://issues.apache.org/jira/browse/SPARK-33268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33268:
----------------------------------
    Fix Version/s: 3.0.2
                   2.4.8

> Fix bugs for casting data from/to PythonUserDefinedType
> -------------------------------------------------------
>
>                 Key: SPARK-33268
>                 URL: https://issues.apache.org/jira/browse/SPARK-33268
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.4.8, 3.0.2, 3.1.0
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>            Priority: Major
>             Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> This PR intends to fix bus for casting data from/to PythonUserDefinedType. A 
> sequence of queries to reproduce this issue is as follows;
> {code} 
> >>> from pyspark.sql import Row
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.types import *
> >>> from pyspark.testing.sqlutils import *
> >>> 
> >>> row = Row(point=ExamplePoint(1.0, 2.0))
> >>> df = spark.createDataFrame([row])
> >>> df.select(col("point").cast(PythonOnlyUDT()))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py",
>  line 1402, in select
>     jdf = self._jdf.select(self._jcols(*cols))
>   File 
> "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
>  line 1305, in __call__
>   File 
> "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", 
> line 111, in deco
>     return f(*a, **kw)
>   File 
> "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
> : java.lang.NullPointerException
>       at 
> org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
>       at 
> org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
> {code} 
> A root cause of this issue is that, since 
> {{PythonUserDefinedType#userClassis}} always null, {{isAssignableFrom}} in 
> {{UserDefinedType#acceptsType}} throws a null exception. To fix it, this PR 
> defines {{acceptsType}} in {{PythonUserDefinedType}} and filters out the null 
> case in {{UserDefinedType#acceptsType}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to