[
https://issues.apache.org/jira/browse/SPARK-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193491#comment-14193491
]
William Benton commented on SPARK-4185:
---------------------------------------
I'm actually not sure this is a bug! My main concern in this case is that
inferring any typing for this collection of objects makes it very difficult to
write meaningful queries. In the fedmsg case, the problem was that the source
data overloaded the meaning of a field name, so I was able to preprocess the
fields to do the renaming. I was thinking that maybe a good solution might be
to have Spark SQL automatically rename fields with conflicting types in
different records (e.g. to “branches_1” and “branches_2” in this case).
> JSON schema inference failed when dealing with type conflicts in arrays
> -----------------------------------------------------------------------
>
> Key: SPARK-4185
> URL: https://issues.apache.org/jira/browse/SPARK-4185
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> {code}
> val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
> val diverging = sparkContext.parallelize(List("""{"branches": ["foo"]}""",
> """{"branches": [{"foo":42}]}"""))
> sqlContext.jsonRDD(diverging) // throws a MatchError
> {code}
> The case is from http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]