[
https://issues.apache.org/jira/browse/SPARK-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003943#comment-15003943
]
Herman van Hovell edited comment on SPARK-11725 at 11/13/15 12:54 PM:
----------------------------------------------------------------------
{{Int}} is a primitive. So there is no way to express {{null}}; in these cases
scala will default to {{0}}.
Use the boxed of the primitive version if you need nullability. For example:
{noformat}
val df = Seq(
(null, "Michael"),
(Integer.valueOf(30), "Andy"),
(Integer.valueOf(19), "Justin")).toDF("age", "name")
val f = udf((x: Integer) => {
if (x != null) Integer.valueOf(x + 1)
else null
})
df.withColumn("age2", f($"age")).show
{noformat}
Would return:
{noformat}
+----+-------+----+
| age| name|age2|
+----+-------+----+
|null|Michael|null|
| 30| Andy| 31|
| 19| Justin| 20|
+----+-------+----+
{noformat}
was (Author: hvanhovell):
{{Int}} is a primitive. So there is no way to express {{null}}; in these case
scala will default to {{0}}.
Use the boxed of the primitive version if you need nullability. For example:
{noformat}
val df = Seq(
(null, "Michael"),
(Integer.valueOf(30), "Andy"),
(Integer.valueOf(19), "Justin")).toDF("age", "name")
val f = udf((x: Integer) => {
if (x != null) Integer.valueOf(x + 1)
else null
})
df.withColumn("age2", f($"age")).show
{noformat}
Would return:
{noformat}
+----+-------+----+
| age| name|age2|
+----+-------+----+
|null|Michael|null|
| 30| Andy| 31|
| 19| Justin| 20|
+----+-------+----+
{noformat}
> Let UDF to handle null value
> ----------------------------
>
> Key: SPARK-11725
> URL: https://issues.apache.org/jira/browse/SPARK-11725
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Jeff Zhang
>
> I notice that currently spark will take the long field as -1 if it is null.
> Here's the sample code.
> {code}
> sqlContext.udf.register("f", (x:Int)=>x+1)
> df.withColumn("age2", expr("f(age)")).show()
> //////////////// Output ///////////////////////
> +----+-------+----+
> | age| name|age2|
> +----+-------+----+
> |null|Michael| 0|
> | 30| Andy| 31|
> | 19| Justin| 20|
> +----+-------+----+
> {code}
> I think for the null value we have 3 options
> * Use a special value to represent it (what spark does now)
> * Always return null if the udf input has null value argument
> * Let udf itself to handle null
> I would prefer the third option
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]