Barry Becker created SPARK-17041:
------------------------------------
Summary: Columns in schema are no longer case sensitive when
reading csv file
Key: SPARK-17041
URL: https://issues.apache.org/jira/browse/SPARK-17041
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 2.0.0
Reporter: Barry Becker
It used to be (in spark 1.6.2) that I could read a csv file that had columns
with names that differed only by case. For example, one column may be "output"
and another called "Output". Now (with spark 2.0.0) if I try to read such a
file, I get an error like this:
{code}
org.apache.spark.sql.AnalysisException: Reference 'Output' is ambiguous, could
be: Output#1263, Output#1295.;
{code}
The schema (dfSchema below) that I pass to the csv read looks like this:
{code}
StructType( StructField(Output,StringType,true), ...
StructField(output,StringType,true), ...)
{code}
The code that does the read is this
{code}
sqlContext.read
.format("csv")
.option("header", "false") // Use first line of all files as header
.option("inferSchema", "false") // Automatically infer data types
.schema(dfSchema)
.csv(dataFile)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]