[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-30082: ---------------------------------- Affects Version/s: 2.0.2 2.1.3 2.2.3 2.3.4 > Zeros are being treated as NaNs > ------------------------------- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4 > Reporter: John Ayad > Assignee: John Ayad > Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-----+-----+ > |index|value| > +-----+-----+ > | 1| 0| > | 2| 3| > | 3| 0| > +-----+-----+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-----+-----+ > |index|value| > +-----+-----+ > | 1| 5| > | 2| 3| > | 3| 5| > +-----+-----+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org