[
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Holden Karau reopened SPARK-27052:
----------------------------------
> Using PySpark udf in transform yields NULL values
> -------------------------------------------------
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 2.4.0
> Reporter: hejsgpuom62c
> Priority: Major
> Labels: bulk-closed
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+---------+-----+
> # | id| xs|xsinc|
> # +---+---------+-----+
> # | 1|[1, 2, 3]| [,,]|
> # +---+---------+-----+
> {code}
>
> Source https://stackoverflow.com/a/53762650
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]