[
https://issues.apache.org/jira/browse/SPARK-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418925#comment-16418925
]
Max Murphy commented on SPARK-15125:
------------------------------------
[~snanda] That would not allow ,, to be distinguished from ,"", whereas we want
,, => null and ,"", => "".
> CSV data source recognizes empty quoted strings in the input as null.
> ----------------------------------------------------------------------
>
> Key: SPARK-15125
> URL: https://issues.apache.org/jira/browse/SPARK-15125
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Suresh Thalamati
> Priority: Major
>
> CSV data source does not differentiate between empty quoted strings and empty
> fields as null. In some scenarios user would want to differentiate between
> these values, especially in the context of SQL where NULL , and empty string
> have different meanings If input data happens to be dump from traditional
> relational data source, users will see different results for the SQL queries.
> {code}
> Repro:
> Test Data: (test.csv)
> year,make,model,comment,price
> 2017,Tesla,Mode 3,looks nice.,35000.99
> 2016,Chevy,Bolt,"",29000.00
> 2015,Porsche,"",,
> scala> val df= sqlContext.read.format("csv").option("header",
> "true").option("inferSchema", "true").option("nullValue",
> null).load("/tmp/test.csv")
> df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more
> fields]
> scala> df.show
> +----+-------+------+-----------+--------+
> |year| make| model| comment| price|
> +----+-------+------+-----------+--------+
> |2017| Tesla|Mode 3|looks nice.|35000.99|
> |2016| Chevy| Bolt| null| 29000.0|
> |2015|Porsche| null| null| null|
> +----+-------+------+-----------+--------+
> Expected:
> +----+-------+------+-----------+--------+
> |year| make| model| comment| price|
> +----+-------+------+-----------+--------+
> |2017| Tesla|Mode 3|looks nice.|35000.99|
> |2016| Chevy| Bolt| | 29000.0|
> |2015|Porsche| | null| null|
> +----+-------+------+-----------+--------+
> {code}
> Testing a fix for the this issue. I will give a shot at submitting a PR for
> this soon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]