[
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295831#comment-17295831
]
hqx commented on CALCITE-4522:
------------------------------
I use my local mysql tabe to make a test:
CREATE TABLE `t_emp` (
`empid` int(11) NOT NULL,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`empid`)
) ENGINE=InnoDB;
Then I execute two queries:
Q1: select empid from t_emp limit 10
Q2: select empid from t_emp limit order by empid 10
I get two physical plan as follow:
Plan for Q1
JdbcProject(empid=[$0]): rowcount = 10.0, cumulative cost = \{117.0 rows,
109.0 cpu, 0.0 io}, id = 55
JdbcSort(fetch=[10], collations=[]): rowcount = 10.0, cumulative cost =
\{109.0 rows, 266.7861 cpu, 0.0 io}, id = 54
JdbcTableScan(table=[[test, t_emp]]): rowcount = 100.0, cumulative
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 1
Plan for Q2
JdbcProject(empid=[$0]): rowcount = 10.0, cumulative cost = \{117.0 rows,
274.7861266955713 cpu, 0.0 io}, id = 113
JdbcSort(sort0=[$0], dir0=[ASC], fetch=[10]): rowcount = 10.0, cumulative
cost = \{109.0 rows, 266.7861 cpu, 0.0 io}, id = 112
JdbcTableScan(table=[[test, t_emp]]): rowcount = 100.0, cumulative cost
= \{100.0 rows, 101.0 cpu, 0.0 io}, id = 1
As you can see, the cpu cost of the JdbcSort is always near 166 (=266-101),
even if the RelCollation#fieldCollations of JdbcSort is empty.
> optimize sort cost formula
> --------------------------
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: hqx
> Priority: Minor
>
> The old method to compute the cost of sort has some problem.
> # when there is no need to sort, it still to compute the cpu cost of sort.
> # use n * log\(n) * rowBytes to estimate the cpu cost may be inaccurate,
> where n means the output row count of the sort operator, and rowBytes means
> the average bytes of one row .
> Instead, I give follow suggestion.
> # the cpu cost is zero if there is no need to sort.
> # use m * log\(n)* rowBytes to compute the cpu cost, where m is the sum of
> offset + limit and n means input row count.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)