Re: renaming SchemaRDD -> DataFrame

Koert Kuipers Mon, 26 Jan 2015 16:29:57 -0800

"The context is that SchemaRDD is becoming a common data format used for
bringing data into Spark from external systems, and used for various
components of Spark, e.g. MLlib's new pipeline API."


i agree. this to me also implies it belongs in spark core, not sql

On Mon, Jan 26, 2015 at 6:11 PM, Michael Malak <
michaelma...@yahoo.com.invalid> wrote:

> And in the off chance that anyone hasn't seen it yet, the Jan. 13 Bay Area
> Spark Meetup YouTube contained a wealth of background information on this
> idea (mostly from Patrick and Reynold :-).
>
> https://www.youtube.com/watch?v=YWppYPWznSQ
>
> ________________________________
> From: Patrick Wendell <pwend...@gmail.com>
> To: Reynold Xin <r...@databricks.com>
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Sent: Monday, January 26, 2015 4:01 PM
> Subject: Re: renaming SchemaRDD -> DataFrame
>
>
> One thing potentially not clear from this e-mail, there will be a 1:1
> correspondence where you can get an RDD to/from a DataFrame.
>
>
> On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote:
> > Hi,
> >
> > We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to
> > get the community's opinion.
> >
> > The context is that SchemaRDD is becoming a common data format used for
> > bringing data into Spark from external systems, and used for various
> > components of Spark, e.g. MLlib's new pipeline API. We also expect more
> and
> > more users to be programming directly against SchemaRDD API rather than
> the
> > core RDD API. SchemaRDD, through its less commonly used DSL originally
> > designed for writing test cases, always has the data-frame like API. In
> > 1.3, we are redesigning the API to make the API usable for end users.
> >
> >
> > There are two motivations for the renaming:
> >
> > 1. DataFrame seems to be a more self-evident name than SchemaRDD.
> >
> > 2. SchemaRDD/DataFrame is actually not going to be an RDD anymore (even
> > though it would contain some RDD functions like map, flatMap, etc), and
> > calling it Schema*RDD* while it is not an RDD is highly confusing.
> Instead.
> > DataFrame.rdd will return the underlying RDD for all RDD methods.
> >
> >
> > My understanding is that very few users program directly against the
> > SchemaRDD API at the moment, because they are not well documented.
> However,
> > oo maintain backward compatibility, we can create a type alias DataFrame
> > that is still named SchemaRDD. This will maintain source compatibility
> for
> > Scala. That said, we will have to update all existing materials to use
> > DataFrame rather than SchemaRDD.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: renaming SchemaRDD -> DataFrame

Reply via email to