[jira] [Commented] (CALCITE-4522) optimize sort cost formula

hqx (Jira) Thu, 04 Mar 2021 23:48:46 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295831#comment-17295831
 ]


hqx commented on CALCITE-4522:
------------------------------

I use my local mysql tabe to make a test:

  CREATE TABLE `t_emp` (
     `empid` int(11) NOT NULL,
     `name` varchar(45) NOT NULL,
   PRIMARY KEY (`empid`)
) ENGINE=InnoDB;

 

Then  I  execute two queries:

    Q1:  select empid from t_emp limit 10

    Q2:  select empid from t_emp limit order by empid 10

 

I get two physical plan as follow:

Plan for Q1

    JdbcProject(empid=[$0]): rowcount = 10.0, cumulative cost = \{117.0 rows, 
109.0 cpu, 0.0 io}, id = 55
       JdbcSort(fetch=[10], collations=[]): rowcount = 10.0, cumulative cost = 
\{109.0 rows, 266.7861 cpu, 0.0 io}, id = 54
           JdbcTableScan(table=[[test, t_emp]]): rowcount = 100.0, cumulative 
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 1

Plan for Q2

JdbcProject(empid=[$0]): rowcount = 10.0, cumulative cost = \{117.0 rows, 
274.7861266955713 cpu, 0.0 io}, id = 113
    JdbcSort(sort0=[$0], dir0=[ASC], fetch=[10]): rowcount = 10.0, cumulative 
cost = \{109.0 rows, 266.7861 cpu, 0.0 io}, id = 112
       JdbcTableScan(table=[[test, t_emp]]): rowcount = 100.0, cumulative cost 
= \{100.0 rows, 101.0 cpu, 0.0 io}, id = 1

 

As you can see, the cpu cost of the JdbcSort is always near 166 (=266-101), 
even if the RelCollation#fieldCollations of JdbcSort is empty.

 

> optimize sort cost formula
> --------------------------
>
>                 Key: CALCITE-4522
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4522
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: hqx
>            Priority: Minor
>
> The old method to compute the cost of sort has some problem.
>  # when there is no need to sort, it still to compute the cpu cost of sort.
>  # use n * log\(n) * rowBytes to estimate the cpu cost may be inaccurate, 
> where n means the output row count of the sort operator, and rowBytes means 
> the average bytes of one row .
> Instead, I give follow suggestion.
>  # the cpu cost is zero if there is no need to sort.
>  # use m * log\(n)* rowBytes to compute the cpu cost, where m is the sum of 
> offset + limit and n means input row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4522) optimize sort cost formula

Reply via email to