Hi All, Sorry for the long email title. I am a bit surprised to find that the current optimizer rule "ConvertToLocalRelation" causes expressions to be eager-evaluated in planning phase, this can be demonstrated with the following code:
scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" }) myUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType))) scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s")) df: org.apache.spark.sql.DataFrame = [UDF(s): string] scala> println(df.queryExecution.optimizedPlan) UDF evaled LocalRelation [UDF(s)#9] This is somewhat unexpected to me because of Spark's lazy execution model. I am wondering if this behavior is by design? Thanks! Li