Maxim Gekk created SPARK-23786:
----------------------------------

             Summary: CSV schema validation - column names are not checked
                 Key: SPARK-23786
                 URL: https://issues.apache.org/jira/browse/SPARK-23786
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Maxim Gekk


Here is a csv file contains two columns of the same type:
{code}
$cat marina.csv
depth, temperature
10.2, 9.0
5.5, 12.3
{code}

If we define the schema with correct types but wrong column names (reversed 
order):
{code:scala}
val schema = new StructType().add("temperature", DoubleType).add("depth", 
DoubleType)
{code}

Spark reads the csv file without any errors:
{code:scala}
val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv")
ds.show
{code}
and outputs wrong result:
{code}
+-----------+-----+
|temperature|depth|
+-----------+-----+
|       10.2|  9.0|
|        5.5| 12.3|
+-----------+-----+
{code}
The correct behavior would be either output error or read columns according its 
names in the schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to