"The context is that SchemaRDD is becoming a common data format used for bringing data into Spark from external systems, and used for various components of Spark, e.g. MLlib's new pipeline API."
i agree. this to me also implies it belongs in spark core, not sql On Mon, Jan 26, 2015 at 6:11 PM, Michael Malak < michaelma...@yahoo.com.invalid> wrote: > And in the off chance that anyone hasn't seen it yet, the Jan. 13 Bay Area > Spark Meetup YouTube contained a wealth of background information on this > idea (mostly from Patrick and Reynold :-). > > https://www.youtube.com/watch?v=YWppYPWznSQ > > ________________________________ > From: Patrick Wendell <pwend...@gmail.com> > To: Reynold Xin <r...@databricks.com> > Cc: "dev@spark.apache.org" <dev@spark.apache.org> > Sent: Monday, January 26, 2015 4:01 PM > Subject: Re: renaming SchemaRDD -> DataFrame > > > One thing potentially not clear from this e-mail, there will be a 1:1 > correspondence where you can get an RDD to/from a DataFrame. > > > On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote: > > Hi, > > > > We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to > > get the community's opinion. > > > > The context is that SchemaRDD is becoming a common data format used for > > bringing data into Spark from external systems, and used for various > > components of Spark, e.g. MLlib's new pipeline API. We also expect more > and > > more users to be programming directly against SchemaRDD API rather than > the > > core RDD API. SchemaRDD, through its less commonly used DSL originally > > designed for writing test cases, always has the data-frame like API. In > > 1.3, we are redesigning the API to make the API usable for end users. > > > > > > There are two motivations for the renaming: > > > > 1. DataFrame seems to be a more self-evident name than SchemaRDD. > > > > 2. SchemaRDD/DataFrame is actually not going to be an RDD anymore (even > > though it would contain some RDD functions like map, flatMap, etc), and > > calling it Schema*RDD* while it is not an RDD is highly confusing. > Instead. > > DataFrame.rdd will return the underlying RDD for all RDD methods. > > > > > > My understanding is that very few users program directly against the > > SchemaRDD API at the moment, because they are not well documented. > However, > > oo maintain backward compatibility, we can create a type alias DataFrame > > that is still named SchemaRDD. This will maintain source compatibility > for > > Scala. That said, we will have to update all existing materials to use > > DataFrame rather than SchemaRDD. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >