David Crossland created SPARK-7301:
--------------------------------------

             Summary: Issue with duplicated fields in interpreted json schemas
                 Key: SPARK-7301
                 URL: https://issues.apache.org/jira/browse/SPARK-7301
             Project: Spark
          Issue Type: Bug
            Reporter: David Crossland


I have a large json dataset that has evolved over time as such some fields seem 
to have slight renames or have been capitalised in some way.  This means there 
are certain fields that spark considers ambiguous when i attempt to access them 

i get a 

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields 
StructField(Currency,StringType,true), StructField(currency,StringType,true);

error

There appears to be no way to resolve an ambiguous field after its been 
inferred by spark sql other than to manually construct the schema using 
StructType/StructField which is a bit heavy handed as the schema is quite 
large.  Is there some way to resolve an ambiguous reference? or affect the 
schema post inference? It seems like something of a bug that i cant tell spark 
to treat both fields as though they were the same.  Ive created a test where i 
manually defined a schema as 

val schema = StructType(Seq(StructField("A", StringType, true)))

And it returns 2 rows when i perform a count on the following dataset

{"A":"test1"}
{"a":"test2"}

If i could modify the schema to remove the duplicate entries then i could work 
around this issue.  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to