[ 
https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2789:
------------------------------------

    Component/s: SQL

> Apply names to RDD to becoming SchemaRDD
> ----------------------------------------
>
>                 Key: SPARK-2789
>                 URL: https://issues.apache.org/jira/browse/SPARK-2789
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Davies Liu
>
> In order to simplify apply schema, we could add an API called applyNames(), 
> which will infer the types in the RDD and create an schema with names, then 
> apply  this schema on it to becoming a SchemaRDD. The names could be provides 
> by String with names separated  by space.
> For example:
> rdd = sc.parallelize([("Alice", 10)])
> srdd = sqlCtx.applyNames(rdd, "name age")
> User don't need to create an case class or StructType to have all power of 
> Spark SQL.
> The string presentation of schema also could support nested structure 
> (MapType, ArrayType and StructType), for example:
> "name age address(city zip) likes[title stars] props{[value type]}"
> It will equal to unnamed schema:
> root
> |--name
> |--age
> |--address
> |--|--city
> |--|--zip
> |--likes
> |--|--element
> |--|--|--title
> |--|--|--starts
> |--props
> |--|--key:
> |--|--value:
> |--|--|--element
> |--|--|--|--value
> |--|--|--|--type
> All the names of fields are seperated by space, the struct of field (if it is 
> nested type) follows the name without space, wich shoud startswith "(" 
> (StructType) or "[" (ArrayType) or "{" (MapType).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to