[jira] [Created] (SPARK-34002) Broken UDF behavior

Mark Hamilton (Jira) Mon, 04 Jan 2021 21:55:05 -0800

Mark Hamilton created SPARK-34002:
-------------------------------------

             Summary: Broken UDF behavior
                 Key: SPARK-34002
                 URL: https://issues.apache.org/jira/browse/SPARK-34002
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.1
            Reporter: Mark Hamilton



UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34002) Broken UDF behavior

Reply via email to