There is this pull request: https://github.com/apache/spark/pull/5713

We mean to merge it for 1.5. Maybe you can help review it too?

On Mon, Jul 27, 2015 at 11:23 AM, Vyacheslav Baranov <
slavik.bara...@gmail.com> wrote:

>  Hi all,
>
> For now it's possible to convert RDD of case class to DataFrame:
>
> case class Person(name: String, age: Int)
>
> val people: RDD[Person] = ...
> val df = sqlContext.createDataFrame(people)
>
> but backward conversion is not possible with existing API, so currently
> code looks like this (example from documentation):
>
> teenagers.map(t => "Name: " + t.getAs[String]("name"))
>
> whereas it would be much more convenient to use RDD of case class:
>
> teenagers.rdd[Person].map("Name: " + _.name)
>
>
> I've implemented proof of concept library that allows to convert DataFrame
> to typed RDD with "Pimp my library" pattern. It adds some typesafety
> (conversion fails before running distributed operation if some fields have
> incompatible types) and it's much more convenient when working with nested
> rows, for example:
>
> case class Room(number: Int, visitors: Seq[Person])
>
> roomsDf.explode[Seq[Row], Person]("visitors",
> "visitor")(_.map(rowToPerson))
>
> Would the community be interested in having this functionality in core?
>
> Regards,
> Vyacheslav
>
>

Reply via email to