[
https://issues.apache.org/jira/browse/SPARK-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Lian resolved SPARK-15550.
--------------------------------
Resolution: Fixed
Issue resolved by pull request 13331
[https://github.com/apache/spark/pull/13331]
> Dataset.show() doesn't disply inner nested structs properly
> -----------------------------------------------------------
>
> Key: SPARK-15550
> URL: https://issues.apache.org/jira/browse/SPARK-15550
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1, 2.0.0
> Reporter: Cheng Lian
> Assignee: Cheng Lian
>
> Say we have the following nested case class:
> {code}
> case class ClassData(a: String, b: Int)
> case class NestedStruct(f: ClassData)
> {code}
> For a Dataset {{ds}} of {{NestedStruct}}, {{ds.show()}} should convert all
> case class instances, including the inner nested {{ClassData}}, into {{Row}}
> instances before displaying them. However, {{ClassData}} instances are just
> displayed using {{toString}}.
> {code}
> val data = Seq(
> "{'f': {'b': 1, 'a': 'foo'}}",
> "{'f': {'b': 2, 'a': 'bar'}}"
> )
> val df = spark.read.json(sc.parallelize(data))
> val ds = df.as[NestedStruct]
> {code}
> Actual output:
> {noformat}
> +----------------+
> | f|
> +----------------+
> |ClassData(foo,1)|
> |ClassData(bar,2)|
> +----------------+
> {noformat}
> Expected output:
> {noformat}
> +-------+
> | f|
> +-------+
> |[1,foo]|
> |[2,bar]|
> +-------+
> {noformat}
> This is not too big a deal for Scala users since Scala case classes always
> come with a well defined default {{toString}} method. But Java beans don't.
> Another point is that, Dataset is just a view of the underlying logical plan,
> and the domain object type may not refer to all fields defined in the
> underlying logical plan. However, users are still allowed to access these
> extra fields using methods like {{Dataset.col}}. Due to this consideration,
> we decided to let {{Dataset.show()}} directly delegate to
> {{Dataset.toDF().show()}}, which shows all fields defined in the logical
> plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]