[ 
https://issues.apache.org/jira/browse/SPARK-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193491#comment-14193491
 ] 

William Benton commented on SPARK-4185:
---------------------------------------

I'm actually not sure this is a bug!  My main concern in this case is that 
inferring any typing for this collection of objects makes it very difficult to 
write meaningful queries.  In the fedmsg case, the problem was that the source 
data overloaded the meaning of a field name, so I was able to preprocess the 
fields to do the renaming.  I was thinking that maybe a good solution might be 
to have Spark SQL automatically rename fields with conflicting types in 
different records (e.g. to “branches_1” and “branches_2” in this case).

> JSON schema inference failed when dealing with type conflicts in arrays
> -----------------------------------------------------------------------
>
>                 Key: SPARK-4185
>                 URL: https://issues.apache.org/jira/browse/SPARK-4185
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> {code}
> val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
> val diverging = sparkContext.parallelize(List("""{"branches": ["foo"]}""", 
> """{"branches": [{"foo":42}]}"""))
> sqlContext.jsonRDD(diverging)  // throws a MatchError
> {code}
> The case is from http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to