Hi All,

Sorry for the long email title. I am a bit surprised to find that the
current optimizer rule "ConvertToLocalRelation" causes expressions to be
eager-evaluated in planning phase, this can be demonstrated with the
following code:

scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" })

myUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
UserDefinedFunction(<function1>,StringType,Some(List(StringType)))


scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s"))

df: org.apache.spark.sql.DataFrame = [UDF(s): string]


scala> println(df.queryExecution.optimizedPlan)

UDF evaled

LocalRelation [UDF(s)#9]

 This is somewhat unexpected to me because of Spark's lazy execution model.

I am wondering if this behavior is by design?

Thanks!
Li

Reply via email to