There are different ways to view this. If its confusing to think that
Source API returning DataFrames, its equivalent to thinking that you are
returning a Dataset[Row], and DataFrame is just a shorthand.
And DataFrame/Datasetp[Row] is to Dataset[String] is what java
Array[Object] is to Array[String]. DataFrame is more general in a way, as
every Dataset can be boiled down to a DataFrame. So to keep the Source APIs
general (and also source-compatible with 1.x), they return DataFrame.

On Thu, Jun 16, 2016 at 12:38 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Is this really an internal / external distinction?
>
> For a concrete example, Source.getBatch seems to be a public
> interface, but returns DataFrame.
>
> On Thu, Jun 16, 2016 at 1:42 PM, Tathagata Das
> <tathagata.das1...@gmail.com> wrote:
> > DataFrame is a type alias of Dataset[Row], so externally it seems like
> > Dataset is the main type and DataFrame is a derivative type.
> > However, internally, since everything is processed as Rows, everything
> uses
> > DataFrames, Type classes used in a Dataset is internally converted to
> rows
> > for processing. . Therefore internally DataFrame is like "main" type
> that is
> > used.
> >
> > On Thu, Jun 16, 2016 at 11:18 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Sorry, meant DataFrame vs Dataset
> >>
> >> On Thu, Jun 16, 2016 at 12:53 PM, Cody Koeninger <c...@koeninger.org>
> >> wrote:
> >> > Is there a principled reason why sql.streaming.* and
> >> > sql.execution.streaming.* are making extensive use of DataFrame
> >> > instead of Datasource?
> >> >
> >> > Or is that just a holdover from code written before the move / type
> >> > alias?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>

Reply via email to