[jira] [Assigned] (SPARK-23786) CSV schema validation - column names are not checked

Apache Spark (JIRA) Fri, 23 Mar 2018 14:16:22 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-23786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-23786:
------------------------------------

    Assignee:     (was: Apache Spark)

> CSV schema validation - column names are not checked
> ----------------------------------------------------
>
>                 Key: SPARK-23786
>                 URL: https://issues.apache.org/jira/browse/SPARK-23786
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Maxim Gekk
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Here is a csv file contains two columns of the same type:
> {code}
> $cat marina.csv
> depth, temperature
> 10.2, 9.0
> 5.5, 12.3
> {code}
> If we define the schema with correct types but wrong column names (reversed 
> order):
> {code:scala}
> val schema = new StructType().add("temperature", DoubleType).add("depth", 
> DoubleType)
> {code}
> Spark reads the csv file without any errors:
> {code:scala}
> val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv")
> ds.show
> {code}
> and outputs wrong result:
> {code}
> +-----------+-----+
> |temperature|depth|
> +-----------+-----+
> |       10.2|  9.0|
> |        5.5| 12.3|
> +-----------+-----+
> {code}
> The correct behavior would be either output error or read columns according 
> its names in the schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-23786) CSV schema validation - column names are not checked

Reply via email to