[
https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust updated SPARK-2789:
------------------------------------
Component/s: SQL
> Apply names to RDD to becoming SchemaRDD
> ----------------------------------------
>
> Key: SPARK-2789
> URL: https://issues.apache.org/jira/browse/SPARK-2789
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Davies Liu
>
> In order to simplify apply schema, we could add an API called applyNames(),
> which will infer the types in the RDD and create an schema with names, then
> apply this schema on it to becoming a SchemaRDD. The names could be provides
> by String with names separated by space.
> For example:
> rdd = sc.parallelize([("Alice", 10)])
> srdd = sqlCtx.applyNames(rdd, "name age")
> User don't need to create an case class or StructType to have all power of
> Spark SQL.
> The string presentation of schema also could support nested structure
> (MapType, ArrayType and StructType), for example:
> "name age address(city zip) likes[title stars] props{[value type]}"
> It will equal to unnamed schema:
> root
> |--name
> |--age
> |--address
> |--|--city
> |--|--zip
> |--likes
> |--|--element
> |--|--|--title
> |--|--|--starts
> |--props
> |--|--key:
> |--|--value:
> |--|--|--element
> |--|--|--|--value
> |--|--|--|--type
> All the names of fields are seperated by space, the struct of field (if it is
> nested type) follows the name without space, wich shoud startswith "("
> (StructType) or "[" (ArrayType) or "{" (MapType).
--
This message was sent by Atlassian JIRA
(v6.2#6252)