Jason Darrell Lowe created SPARK-30530: ------------------------------------------
Summary: CSV load followed by "is null" filter produces incorrect results Key: SPARK-30530 URL: https://issues.apache.org/jira/browse/SPARK-30530 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Jason Darrell Lowe Trying to filter on is null from values loaded from a CSV file has regressed recently and now produces incorrect results. Given a CSV file with the contents: {noformat:title=floats.csv} 100.0,1.0, 200.0,, 300.0,3.0, 1.0,4.0, ,4.0, 500.0,, ,6.0, -500.0,50.5 {noformat} Filtering this data for the first column being null should return exactly two rows, but it is returning extraneous rows with nulls: {noformat} scala> val schema = StructType(Array(StructField("floats", FloatType, true),StructField("more_floats", FloatType, true))) schema: org.apache.spark.sql.types.StructType = StructType(StructField(floats,FloatType,true), StructField(more_floats,FloatType,true)) scala> val df = spark.read.schema(schema).csv("floats.csv") df: org.apache.spark.sql.DataFrame = [floats: float, more_floats: float] scala> df.filter("floats is null").show +------+-----------+ |floats|more_floats| +------+-----------+ | null| null| | null| null| | null| null| | null| null| | null| 4.0| | null| null| | null| 6.0| +------+-----------+ {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org