[ https://issues.apache.org/jira/browse/SPARK-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-11725: ------------------------------------- Assignee: Wenchen Fan > Let UDF to handle null value > ---------------------------- > > Key: SPARK-11725 > URL: https://issues.apache.org/jira/browse/SPARK-11725 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Jeff Zhang > Assignee: Wenchen Fan > Priority: Blocker > Labels: releasenotes > Fix For: 1.6.0 > > > I notice that currently spark will take the long field as -1 if it is null. > Here's the sample code. > {code} > sqlContext.udf.register("f", (x:Int)=>x+1) > df.withColumn("age2", expr("f(age)")).show() > //////////////// Output /////////////////////// > +----+-------+----+ > | age| name|age2| > +----+-------+----+ > |null|Michael| 0| > | 30| Andy| 31| > | 19| Justin| 20| > +----+-------+----+ > {code} > I think for the null value we have 3 options > * Use a special value to represent it (what spark does now) > * Always return null if the udf input has null value argument > * Let udf itself to handle null > I would prefer the third option -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org