Mark Hamilton created SPARK-34002: ------------------------------------- Summary: Broken UDF behavior Key: SPARK-34002 URL: https://issues.apache.org/jira/browse/SPARK-34002 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1 Reporter: Mark Hamilton
UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org