There are different ways to view this. If its confusing to think that Source API returning DataFrames, its equivalent to thinking that you are returning a Dataset[Row], and DataFrame is just a shorthand. And DataFrame/Datasetp[Row] is to Dataset[String] is what java Array[Object] is to Array[String]. DataFrame is more general in a way, as every Dataset can be boiled down to a DataFrame. So to keep the Source APIs general (and also source-compatible with 1.x), they return DataFrame.
On Thu, Jun 16, 2016 at 12:38 PM, Cody Koeninger <c...@koeninger.org> wrote: > Is this really an internal / external distinction? > > For a concrete example, Source.getBatch seems to be a public > interface, but returns DataFrame. > > On Thu, Jun 16, 2016 at 1:42 PM, Tathagata Das > <tathagata.das1...@gmail.com> wrote: > > DataFrame is a type alias of Dataset[Row], so externally it seems like > > Dataset is the main type and DataFrame is a derivative type. > > However, internally, since everything is processed as Rows, everything > uses > > DataFrames, Type classes used in a Dataset is internally converted to > rows > > for processing. . Therefore internally DataFrame is like "main" type > that is > > used. > > > > On Thu, Jun 16, 2016 at 11:18 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> Sorry, meant DataFrame vs Dataset > >> > >> On Thu, Jun 16, 2016 at 12:53 PM, Cody Koeninger <c...@koeninger.org> > >> wrote: > >> > Is there a principled reason why sql.streaming.* and > >> > sql.execution.streaming.* are making extensive use of DataFrame > >> > instead of Datasource? > >> > > >> > Or is that just a holdover from code written before the move / type > >> > alias? > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> For additional commands, e-mail: dev-h...@spark.apache.org > >> > > >