[
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Lian updated SPARK-19716:
-------------------------------
Fix Version/s: (was: 2.3.0)
2.2.0
> Dataset should allow by-name resolution for struct type elements in array
> -------------------------------------------------------------------------
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array<struct<a:
> int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we
> will add cast for each field, except struct type field, because struct type
> is flexible, the number of columns can mismatch. We should probably also skip
> cast for array and map type.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]