[
https://issues.apache.org/jira/browse/SPARK-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tycho Grouwstra resolved SPARK-11431.
-------------------------------------
Resolution: Implemented
> Allow exploding arrays of structs in DataFrames
> -----------------------------------------------
>
> Key: SPARK-11431
> URL: https://issues.apache.org/jira/browse/SPARK-11431
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Tycho Grouwstra
> Labels: features
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I am creating DataFrames from some [JSON
> data|http://www.kayak.com/h/explore/api?airport=AMS], and would like to
> explode an array of structs (as are common in JSON) to their own rows so I
> could start analyzing the data using GraphX. I believe many others might have
> use for this as well, since most web data is in JSON format.
> This feature would build upon the existing `explode` functionality added to
> DataFrames by [~marmbrus], which currently errors when you call it on such
> arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor
> function to infer column types -- this approach is insufficient in the case
> of Rows, since their type does not contain the required info. The alternative
> here would be to instead grab the schema info from the existing schema for
> such cases.
> I'm trying to implement a patch that might add this functionality, so stay
> tuned until I've figured that out. I'm new here though so I'll probably have
> use for some feedback...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]