[
https://issues.apache.org/jira/browse/CALCITE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ZiLin Chen updated CALCITE-4682:
--------------------------------
Summary: Cost of limit operator may breaks total cost assumption (was:
Cost of operator limit may breaks total cost assumption)
> Cost of limit operator may breaks total cost assumption
> -------------------------------------------------------
>
> Key: CALCITE-4682
> URL: https://issues.apache.org/jira/browse/CALCITE-4682
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: ZiLin Chen
> Priority: Major
>
> Total cost assumption means each operator wound run to completion, so the
> actual running cost of a plan will be equal to the cumulative cost.
> Any way calcite can provide to solve the problem about cost of limit operator?
> Limit operator can affect how the join below to choose specific algorithm by
> start up cost instead of cumulative cost.
> However volcano planner can only deal with the total cost, which works well
> in OLAP system, but OLTP system need some kind of cost model to solve this
> problem.
>
> give a sql like: select * from A join B on A.id = B.id limit 1.
> Limit
> - Join
> - TableScan A (10,000,000 rows)
> - TableScan B (10,000,000 rows)
>
> Consider two join algorithms HashJoin and IndexNestedLoopJoin.
> It is more efficient to use IndexNestedLoopJoin instead of HashJoin, because
> of we just need to fetch some rows from A then index look up B. when the
> number of join output reach limit fetch size, it is over. However using hash
> join need to build a hash table with 10,000,000 rows.
>
> Now coming to VolcanoPlanner, the best cost of this sql will be computed as
> cost of limit operator plus cumulative cost of join. If we only consider
> cumulative cost, then HashJoin wound likely be chosen for two large tables
> join.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)