Konstantin Orlov created IGNITE-23968:
-----------------------------------------
Summary: Sql. Improve row count estimation for joins
Key: IGNITE-23968
URL: https://issues.apache.org/jira/browse/IGNITE-23968
Project: Ignite
Issue Type: Improvement
Components: sql
Reporter: Konstantin Orlov
Current rows count estimation significantly underestimates result set of joins,
which causes optimizer to pick up suboptimal plans in certain cases.
For example, let's have a look at query below:
{code:java}
// Some comments here
public String getFoo()
{
create table CATALOG_RETURNS
(
<...>
constraint CATALOG_RETURNS_PK
primary key (CR_ITEM_SK, CR_ORDER_NUMBER)
);
create table CATALOG_SALES
(
<...>
constraint CATALOG_RETURNS_PK
primary key (CS_ITEM_SK, CS_ORDER_NUMBER)
);
explain plan for
select *
from catalog_sales
,catalog_returns
where cs_item_sk = cr_item_sk
and cs_order_number = cr_order_number;
--------------
HashJoin(...): rowcount = 22500.0
TableScan(table=[[PUBLIC, CATALOG_RETURNS]]): rowcount = 1000000.0
Exchange(...): <...>
TableScan(table=[[PUBLIC, CATALOG_SALES]]): rowcount = 1000000.0
}
{code}
When joining two tables with 1kk rows each by primary key, estimated result set
size is only 22.5k rows. Things get even worse when there is several joins with
dimensions tables: after a few joins estimated result set is close to 1 (a
single row).
Given that we don't support foreign keys, as well as we don't have proper
statistics yet, we need to introduce heuristics to improve row count estimation
for joins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)