Mark Hamilton created SPARK-34002:
-------------------------------------
Summary: Broken UDF behavior
Key: SPARK-34002
URL: https://issues.apache.org/jira/browse/SPARK-34002
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.0.1
Reporter: Mark Hamilton
UDFs can behave differently depending on if a dataframe is cached, despite the
dataframe being identical
Repro:
{code:java}
case class Bar(a: Int)
import spark.implicits._
def f1(bar: Bar): Option[Bar] = {
None
}
def f2(bar: Bar): Option[Bar] = {
Option(bar)
}
val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)
// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
.withColumn("c1", udf1(col("c0")))
.withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]