Re: Project + Sort on single and on multiple columns

Julian Hyde Fri, 10 Nov 2017 18:06:31 -0800

While the cost of Project does not depend heavily on the number of input 
columns, the cost of Sort (or at least a typical Sort algorithm such as 
external merge sort) does depend on the number of columns (or more precisely on 
the average row size in bytes). So, if the Project reduces the number of 
columns (as most Projects do) then the Sort will have lower cost if performed 
after the Project, because it is handling fewer bytes.



> On Nov 10, 2017, at 10:32 AM, Luis Fernando Kauer 
> <[email protected]> wrote:
> 
> I'm trying to fix https://issues.apache.org/jira/browse/CALCITE-1906 and I'm 
> facing a similar problem.
> After managing to make JdbcSort to work, sometimes the JdbcProject is above 
> JdbcSort and the generated SQL is wrong because RelToSqlConverter uses 
> SqlImplementator.Clause enum to decide when to create subqueries, but since 
> ORDER_BY is after SELECT, once it gets to JdbcProject it can't use the same 
> query because it already used ORDER_BY.
> The rule responsable for this is SortProjectTransposeRule.  The opposite rule 
> is ProjectSortTransposeRule, but this one only matches if the sort node is 
> exactly Sort.class, so it ends up not matching.
> Is pushing the Project above Sort usually a good final plan or is it done to 
> allow other rules to match?  If it is not, maybe we should solve this in the 
> core project.
> 
> 
> 
>    Em sexta-feira, 10 de novembro de 2017 15:57:34 BRST, Michael Mior 
> <[email protected]> escreveu:  
> 
> Since the cost of the project doesn't depend on the number of columns being
> projected or the size of the input, putting the project before or after the
> sort will result in the same estimated cost. One approach would be to scale
> the cost of the projection based on the fraction of columns projected.
> 
> --
> Michael Mior
> [email protected]
> 
> 2017-11-10 12:42 GMT-05:00 Christian Tzolov <[email protected]>:
> 
>> I've observed in my
>> no-sql adapter
>>  implementation that for q
>> ueries with
>> P
>> roject
>>  +
>> S
>> ort by
>> ONE
>>   column
>> t
>> he
>> Project
>> 
>> is pushed (as expected) 
>> before the Sort but for Sort
>> on MULTIPLE
>>   columns
>> the Sort is before the Project.
>> For example
>> for a query with one
>> sort column:
>> 
>> SELECT yearPublished FROM BookMaster ORDER BY yearPublished ASC
>> 
>> The plan looks like expected (project before the sort)
>> 
>> 
>> PLAN=GeodeToEnumerableConverterRel
>>   *GeodeSortRel*(sort0=[$0], dir0=[ASC])
>>         GeodeProjectRel(yearPublished=[$2])
>>             GeodeTableScanRel(table=[[TEST, BookMaster]])
>> 
>> But
>>  for sort
>> with
>> two
>> 
>> columns:
>> 
>> SELECT yearPublished, itemNumber from BookMaster ORDER BY yearPublished
>> ASC, itemNumber ASC
>> 
>> The
>> the plan is:
>> 
>> 
>> PLAN=GeodeToEnumerableConverterRel
>>   GeodeProjectRel(yearPublished=[$2], itemNumber=[$0])
>>     *GeodeSortRel*(sort0=[$2], sort1=[$0], dir0=[ASC], dir1=[ASC])
>>       GeodeTableScanRel(table=[[TEST, BookMaster]])
>> 
>> I'm not sure i can explain
>>  why in the second case the Sort appears before the Project?
>> Here
>> are my cost functions:
>> 
>> * 
>> GeodeSortRel
>> :
>> https://github.com/tzolov/calcite/blob/geode-1.3/geode/
>> src/main/java/org/apache/calcite/adapter/geode/rel/GeodeSortRel.java#L51
>> 
>> * GoedeProjectRel:
>> https://github.com/tzolov/calcite/blob/4a631d9055340f64f5e644454551f9
>> 64ea08f9de/geode/src/main/java/org/apache/calcite/adapter/geode/rel/
>> GeodeProjectRel.java#L52
>> 
>> 
>> 
>> Cheers,
>> Christian

Re: Project + Sort on single and on multiple columns

Reply via email to