Yes, the project will reduce the amount of data required to be sorted, but
the cost model needs to reflect this. Although my mistake, Project already
implements this behaviour by multiplying the number of rows by the
estimated number of rows by the number of projected columns.

In short, not sure why you're experiencing this behaviour but maybe others
have some insights.

--
Michael Mior
[email protected]

2017-11-10 13:07 GMT-05:00 Christian Tzolov <[email protected]>:

> Not sure i understand. Lets say that we have rows of size 10MB each. IMO it
> is not the same if we perform the sort on the complete row (e.g. 10MB) or
> on subset left after the Project. E.g. my intuition is that the Project
> will shrink the size and therefore should be performed before the Sort (As
> it does for single column sort)
> Or is this intuition wrong?
>
>
> On 10 November 2017 at 18:57, Michael Mior <[email protected]> wrote:
>
> > Since the cost of the project doesn't depend on the number of columns
> being
> > projected or the size of the input, putting the project before or after
> the
> > sort will result in the same estimated cost. One approach would be to
> scale
> > the cost of the projection based on the fraction of columns projected.
> >
> > --
> > Michael Mior
> > [email protected]
> >
> > 2017-11-10 12:42 GMT-05:00 Christian Tzolov <[email protected]>:
> >
> > > ​I've observed in my
> > > no-sql adapter
> > > ​ implementation that for q
> > > ueries with
> > > ​P
> > > roject
> > > ​ +
> > > ​S
> > > ort by
> > > ​ONE
> > >  column
> > > ​t​
> > > he
> > > ​Project
> > >
> > > ​is pushed (as expected) ​
> > > before the Sort but for Sort
> > > ​on MULTIPLE
> > >  columns
> > > ​the Sort is before the Project.
> > > For example
> > > ​for a query with one
> > > sort column:
> > >
> > > SELECT yearPublished FROM BookMaster ORDER BY yearPublished ASC
> > >
> > > ​The plan looks like expected (project before the sort)
> > >
> > >
> > > PLAN=GeodeToEnumerableConverterRel
> > >   *GeodeSortRel*(sort0=[$0], dir0=[ASC])
> > >        GeodeProjectRel(yearPublished=[$2])
> > >            GeodeTableScanRel(table=[[TEST, BookMaster]])
> > >
> > > But
> > > ​ for sort​
> > > with
> > > ​two​
> > > ​
> > > columns:
> > >
> > > SELECT yearPublished, itemNumber from BookMaster ORDER BY yearPublished
> > > ASC, itemNumber ASC
> > >
> > > The
> > > ​the plan is:
> > >
> > >
> > > PLAN=GeodeToEnumerableConverterRel
> > >   GeodeProjectRel(yearPublished=[$2], itemNumber=[$0])
> > >     *GeodeSortRel*(sort0=[$2], sort1=[$0], dir0=[ASC], dir1=[ASC])
> > >       GeodeTableScanRel(table=[[TEST, BookMaster]])
> > >
> > > I'm not sure i can explain
> > > ​ why in the second case the Sort appears before the Project?
> > > Here
> > > ​are my cost functions:
> > >
> > > ​* ​
> > > GeodeSortRel
> > > ​:
> > > https://github.com/tzolov/calcite/blob/geode-1.3/geode/
> > > src/main/java/org/apache/calcite/adapter/geode/rel/
> GeodeSortRel.java#L51
> > >
> > > * GoedeProjectRel:
> > > https://github.com/tzolov/calcite/blob/4a631d9055340f64f5e644454551f9
> > > 64ea08f9de/geode/src/main/java/org/apache/calcite/adapter/geode/rel/
> > > GeodeProjectRel.java#L52
> > > ​
> > >
> > > ​
> > > ​Cheers,
> > > Christian​
> > >
> >
>
>
>
> --
> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Principle Software
> Engineer | Pivotal <http://pivotal.io/> | [email protected] |+31610285517
>

Reply via email to