[
https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090485#comment-15090485
]
Liang-Chi Hsieh edited comment on SPARK-12648 at 1/9/16 7:26 AM:
-----------------------------------------------------------------
You don't need to handle the null value.
{code}
scala> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name",
"weight")
df: org.apache.spark.sql.DataFrame = [name: string, weight: double]
scala> val addTwo = udf((d: Double) => d+2)
addTwo: org.apache.spark.sql.UserDefinedFunction =
UserDefinedFunction(<function1>,DoubleType,Some(List(DoubleType)))
scala> val df2 = df.withColumn("plusTwo", addTwo(df("weight")))
df2: org.apache.spark.sql.DataFrame = [name: string, weight: double ... 1 more
field]
scala> df2.explain
== Physical Plan ==
Project [_1#0 AS name#2,_2#1 AS weight#3,if (isnull(_2#1)) null else UDF(_2#1)
AS plusTwo#4]
+- Scan ExistingRDD[_1#0,_2#1]
{code}
You can see that Spark SQL inserts an if ... else to deal with null value.
was (Author: viirya):
You don't need to handle the null value.
scala> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name",
"weight")
df: org.apache.spark.sql.DataFrame = [name: string, weight: double]
scala> val addTwo = udf((d: Double) => d+2)
addTwo: org.apache.spark.sql.UserDefinedFunction =
UserDefinedFunction(<function1>,DoubleType,Some(List(DoubleType)))
scala> val df2 = df.withColumn("plusTwo", addTwo(df("weight")))
df2: org.apache.spark.sql.DataFrame = [name: string, weight: double ... 1 more
field]
scala> df2.explain
== Physical Plan ==
Project [_1#0 AS name#2,_2#1 AS weight#3,if (isnull(_2#1)) null else UDF(_2#1)
AS plusTwo#4]
+- Scan ExistingRDD[_1#0,_2#1]
You can see that Spark SQL inserts an if ... else to deal with null value.
> UDF with Option[Double] throws ClassCastException
> -------------------------------------------------
>
> Key: SPARK-12648
> URL: https://issues.apache.org/jira/browse/SPARK-12648
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: Mikael Valot
>
> I can write an UDF that returns an Option[Double], and the DataFrame's
> schema is correctly inferred to be a nullable double.
> However I cannot seem to be able to write a UDF that takes an Option as an
> argument:
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name",
> "weight")
> import org.apache.spark.sql.functions._
> val addTwo = udf((d: Option[Double]) => d.map(_+2))
> df.withColumn("plusTwo", addTwo(df("weight"))).show()
> =>
> 2016-01-05T14:41:52 Executor task launch worker-0 ERROR
> org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
> at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:18)
> ~[na:na]
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source) ~[na:na]
> at
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
> ~[spark-sql_2.10-1.6.0.jar:1.6.0]
> at
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
> ~[spark-sql_2.10-1.6.0.jar:1.6.0]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> ~[scala-library-2.10.5.jar:na]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]