[ 
https://issues.apache.org/jira/browse/SPARK-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-15550.
--------------------------------
    Resolution: Fixed

Issue resolved by pull request 13331
[https://github.com/apache/spark/pull/13331]

> Dataset.show() doesn't disply inner nested structs properly
> -----------------------------------------------------------
>
>                 Key: SPARK-15550
>                 URL: https://issues.apache.org/jira/browse/SPARK-15550
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1, 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>
> Say we have the following nested case class:
> {code}
> case class ClassData(a: String, b: Int)
> case class NestedStruct(f: ClassData)
> {code}
> For a Dataset {{ds}} of {{NestedStruct}}, {{ds.show()}} should convert all 
> case class instances, including the inner nested {{ClassData}}, into {{Row}} 
> instances before displaying them. However, {{ClassData}} instances are just 
> displayed using {{toString}}.
> {code}
> val data = Seq(
>   "{'f': {'b': 1, 'a': 'foo'}}",
>   "{'f': {'b': 2, 'a': 'bar'}}"
> )
> val df = spark.read.json(sc.parallelize(data))
> val ds = df.as[NestedStruct]
> {code}
> Actual output:
> {noformat}
> +----------------+
> |               f|
> +----------------+
> |ClassData(foo,1)|
> |ClassData(bar,2)|
> +----------------+
> {noformat}
> Expected output:
> {noformat}
> +-------+
> |      f|
> +-------+
> |[1,foo]|
> |[2,bar]|
> +-------+
> {noformat}
> This is not too big a deal for Scala users since Scala case classes always 
> come with a well defined default {{toString}} method. But Java beans don't.
> Another point is that, Dataset is just a view of the underlying logical plan, 
> and the domain object type may not refer to all fields defined in the 
> underlying logical plan. However, users are still allowed to access these 
> extra fields using methods like {{Dataset.col}}. Due to this consideration, 
> we decided to let {{Dataset.show()}} directly delegate to 
> {{Dataset.toDF().show()}}, which shows all fields defined in the logical 
> plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to