[jira] [Commented] (CALCITE-4682) Cost of Operator limit may break cumulative cost calculation assumption

Julian Hyde (Jira) Sun, 11 Jul 2021 17:54:11 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378817#comment-17378817
 ]


Julian Hyde commented on CALCITE-4682:
--------------------------------------

In Calcite "cumulative cost" means the cost of an operator plus all of its 
descendants (input operators, and their inputs). You seem to be using the term 
for "the cost of an operator assuming that it runs to completion", which is 
different. Let's find another word for that, please.

> Cost of Operator limit may break cumulative cost calculation assumption
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-4682
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4682
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: ZiLin Chen
>            Priority: Major
>
> Any way calcite can provide to solve the problem about cost of limit operator?
> Limit operator can affect how the join below to choose specific algorithm by 
> start up cost instead of cumulative cost.
> However volcano planner can only deal with the total cost, which works well 
> in OLAP system, but OLTP system need some kind of cost model to solve this 
> problem.
>  
> give a sql like: select * from A join B on A.id = B.id limit 1.
> Limit
>  - Join 
>   - TableScan A  (10,000,000 rows)
>   - TableScan B （10,000,000 rows）
>  
> Consider two join algorithms HashJoin and IndexNestedLoopJoin.
> It is more efficient to use IndexNestedLoopJoin instead of HashJoin, because 
> of we just need to fetch some rows from A then index look up B. when the 
> number of join output reach limit fetch size, it is over. However using hash 
> join need to build a hash table with 10,000,000 rows. 
>  
> Now coming to VolcanoPlanner, the best cost of this sql will be computed as 
> cost of limit operator plus cumulative cost of join. If we only consider 
> cumulative cost, then HashJoin wound likely be chosen for two large table 
> join. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4682) Cost of Operator limit may break cumulative cost calculation assumption

Reply via email to