[jira] [Updated] (SPARK-13730) Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT

Michael Armbrust (JIRA) Mon, 07 Mar 2016 12:05:14 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust updated SPARK-13730:
-------------------------------------
    Target Version/s: 2.0.0

> Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT
> ------------------------------------------------------------------
>
>                 Key: SPARK-13730
>                 URL: https://issues.apache.org/jira/browse/SPARK-13730
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.0.0
>            Reporter: Franklyn Dsouza
>            Priority: Critical
>
> Basically I'm putting nulls into a non-nullable LongType column and doing a 
> transformation operation on that column, the result is a column with nulls 
> converted to 0. 
> I haven't tested this on 1.6.1 or in Scala.
> Heres an example 
> {code}
> from pyspark.sql import types
> from pyspark.sql import DataFrame, types, functions as F
> sql_schema = types.StructType([
>   types.StructField("a", types.LongType(), True),
>   types.StructField("b", types.StringType(),  True),
> ])
> df = sqlCtx.createDataFrame([
>     (1, "one"),
>     (None, "two"),
> ], sql_schema)
> # Everything is fine here
> df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]
> def assert_not_null(val):
>     return val
> udf = F.udf(assert_not_null, types.LongType())
> df = df.withColumnRenamed('a', "_tmp_col")
> df = df.withColumn('a', udf(df._tmp_col))
> df = df.drop("_tmp_col")
> # None gets converted to 0
> df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13730) Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT

Reply via email to