[
https://issues.apache.org/jira/browse/SPARK-11941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032931#comment-15032931
]
Henri DF edited comment on SPARK-11941 at 12/1/15 2:14 AM:
-----------------------------------------------------------
I think "might be nicer if it was flat' is a bit of an understatement
The current representation isn't of much use with nested structs. If it's hard
to fix, wouldn't it be better to make this private rather than leave exposed it
in its current state?
was (Author: henridf):
I think "might be nicer if it was flat' is a bit of an understatement
The current representation isn't of much use with nested structs. If it's hard
to fix, would it be better to remove this than leave it in its current state?
> JSON representation of nested StructTypes could be more uniform
> ---------------------------------------------------------------
>
> Key: SPARK-11941
> URL: https://issues.apache.org/jira/browse/SPARK-11941
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Henri DF
>
> I have a json file with a single row {code}{"a":1, "b": 1.0, "c": "asdfasd",
> "d":[1, 2, 4]}{code} After reading that file in, the schema is correctly
> inferred:
> {code}
> scala> df.printSchema
> root
> |-- a: long (nullable = true)
> |-- b: double (nullable = true)
> |-- c: string (nullable = true)
> |-- d: array (nullable = true)
> | |-- element: long (containsNull = true)
> {code}
> However, the json representation has a strange nesting under "type" for
> column "d":
> {code}
> scala> df.collect()(0).schema.prettyJson
> res60: String =
> {
> "type" : "struct",
> "fields" : [ {
> "name" : "a",
> "type" : "long",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "b",
> "type" : "double",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "c",
> "type" : "string",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "d",
> "type" : {
> "type" : "array",
> "elementType" : "long",
> "containsNull" : true
> },
> "nullable" : true,
> "metadata" : { }
> }]
> }
> {code}
> Specifically, in the last element, "type" is an object instead of being a
> string. I would expect the last element to be:
> {code}
> {
> "name":"d",
> "type":"array",
> "elementType":"long",
> "containsNull":true,
> "nullable":true,
> "metadata":{}
> }
> {code}
> There's a similar issue for nested structs.
> (I ran into this while writing node.js bindings, wanted to recurse down this
> representation, which would be nicer if it was uniform...).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]