Maxim Gekk created SPARK-23786:
----------------------------------
Summary: CSV schema validation - column names are not checked
Key: SPARK-23786
URL: https://issues.apache.org/jira/browse/SPARK-23786
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.0
Reporter: Maxim Gekk
Here is a csv file contains two columns of the same type:
{code}
$cat marina.csv
depth, temperature
10.2, 9.0
5.5, 12.3
{code}
If we define the schema with correct types but wrong column names (reversed
order):
{code:scala}
val schema = new StructType().add("temperature", DoubleType).add("depth",
DoubleType)
{code}
Spark reads the csv file without any errors:
{code:scala}
val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv")
ds.show
{code}
and outputs wrong result:
{code}
+-----------+-----+
|temperature|depth|
+-----------+-----+
| 10.2| 9.0|
| 5.5| 12.3|
+-----------+-----+
{code}
The correct behavior would be either output error or read columns according its
names in the schema.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]