Konstantin Orlov created IGNITE-23968:
-----------------------------------------

             Summary: Sql. Improve row count estimation for joins
                 Key: IGNITE-23968
                 URL: https://issues.apache.org/jira/browse/IGNITE-23968
             Project: Ignite
          Issue Type: Improvement
          Components: sql
            Reporter: Konstantin Orlov


Current rows count estimation significantly underestimates result set of joins, 
which causes optimizer to pick up suboptimal plans in certain cases.

For example, let's have a look at query below:

{code:java}
// Some comments here
public String getFoo()
{
create table CATALOG_RETURNS
(
    <...>
    constraint CATALOG_RETURNS_PK
        primary key (CR_ITEM_SK, CR_ORDER_NUMBER)
);

create table CATALOG_SALES
(
    <...>
    constraint CATALOG_RETURNS_PK
        primary key (CS_ITEM_SK, CS_ORDER_NUMBER)
);

explain plan for
select *
  from catalog_sales
      ,catalog_returns
  where cs_item_sk = cr_item_sk
    and cs_order_number = cr_order_number;
--------------
HashJoin(...): rowcount = 22500.0
  TableScan(table=[[PUBLIC, CATALOG_RETURNS]]): rowcount = 1000000.0
  Exchange(...): <...>
    TableScan(table=[[PUBLIC, CATALOG_SALES]]): rowcount = 1000000.0
}
{code}

When joining two tables with 1kk rows each by primary key, estimated result set 
size is only 22.5k rows. Things get even worse when there is several joins with 
dimensions tables: after a few joins estimated result set is close to 1 (a 
single row). 

Given that we don't support foreign keys, as well as we don't have proper 
statistics yet, we need to introduce heuristics to improve row count estimation 
for joins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to