[
https://issues.apache.org/jira/browse/SPARK-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust updated SPARK-13730:
-------------------------------------
Target Version/s: 2.0.0
> Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT
> ------------------------------------------------------------------
>
> Key: SPARK-13730
> URL: https://issues.apache.org/jira/browse/SPARK-13730
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 2.0.0
> Reporter: Franklyn Dsouza
> Priority: Critical
>
> Basically I'm putting nulls into a non-nullable LongType column and doing a
> transformation operation on that column, the result is a column with nulls
> converted to 0.
> I haven't tested this on 1.6.1 or in Scala.
> Heres an example
> {code}
> from pyspark.sql import types
> from pyspark.sql import DataFrame, types, functions as F
> sql_schema = types.StructType([
> types.StructField("a", types.LongType(), True),
> types.StructField("b", types.StringType(), True),
> ])
> df = sqlCtx.createDataFrame([
> (1, "one"),
> (None, "two"),
> ], sql_schema)
> # Everything is fine here
> df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]
> def assert_not_null(val):
> return val
> udf = F.udf(assert_not_null, types.LongType())
> df = df.withColumnRenamed('a', "_tmp_col")
> df = df.withColumn('a', udf(df._tmp_col))
> df = df.drop("_tmp_col")
> # None gets converted to 0
> df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]