[jira] [Created] (SPARK-15550) Dataset.show() doesn't disply inner nested structs properly

Cheng Lian (JIRA) Wed, 25 May 2016 23:17:47 -0700

Cheng Lian created SPARK-15550:
----------------------------------

             Summary: Dataset.show() doesn't disply inner nested structs 
properly
                 Key: SPARK-15550
                 URL: https://issues.apache.org/jira/browse/SPARK-15550
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.1, 2.0.0
            Reporter: Cheng Lian
            Assignee: Cheng Lian



Say we have the following nested case class:

{code}
case class ClassData(a: String, b: Int)
case class NestedStruct(f: ClassData)
{code}

For a Dataset {{ds}} of {{NestedStruct}}, {{ds.show()}} should convert all case 
class instances, including the inner nested {{ClassData}}, into {{Row}} 
instances before displaying them. However, {{ClassData}} instances are just 
displayed using {{toString}}.

{code}
val data =
  s"""{"f": {"b": 1, "a": "foo"}}
     |{"f": {"b": 2, "a": "bar"}}
     |""".stripMargin.trim.split("\n")

val df = spark.read.json(sc.parallelize(data))
val ds = df.as[NestedStruct]
{code}

Actual output:

{noformat}
+----------------+
|               f|
+----------------+
|ClassData(foo,1)|
|ClassData(bar,2)|
+----------------+
{noformat}

Expected output:

{noformat}
+-------+
|      f|
+-------+
|[1,foo]|
|[2,bar]|
+-------+
{noformat}

This is not too big a deal for Scala users since Scala case classes always come 
with a well defined default {{toString}} method. But Java beans don't.

Another point is that, Dataset is just a view of the underlying logical plan, 
and the domain object type may not refer to all fields defined in the 
underlying logical plan. However, users are still allowed to access these extra 
fields using methods like {{Dataset.col}}. Due to this consideration, we 
decided to let {{Dataset.show()}} directly delegate to 
{{Dataset.toDF().show()}}, which shows all fields defined in the logical plan.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-15550) Dataset.show() doesn't disply inner nested structs properly

Reply via email to