[
https://issues.apache.org/jira/browse/IGNITE-23968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Orlov updated IGNITE-23968:
--------------------------------------
Fix Version/s: 3.1
> Sql. Improve row count estimation for joins
> -------------------------------------------
>
> Key: IGNITE-23968
> URL: https://issues.apache.org/jira/browse/IGNITE-23968
> Project: Ignite
> Issue Type: Improvement
> Components: sql
> Reporter: Konstantin Orlov
> Assignee: Konstantin Orlov
> Priority: Major
> Labels: ignite-3
> Fix For: 3.1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Current rows count estimation significantly underestimates result set of
> joins, which causes optimizer to pick up suboptimal plans in certain cases.
> For example, let's have a look at query below:
> {code:java}
> // Some comments here
> public String getFoo()
> {
> create table CATALOG_RETURNS
> (
> <...>
> constraint CATALOG_RETURNS_PK
> primary key (CR_ITEM_SK, CR_ORDER_NUMBER)
> );
> create table CATALOG_SALES
> (
> <...>
> constraint CATALOG_RETURNS_PK
> primary key (CS_ITEM_SK, CS_ORDER_NUMBER)
> );
> explain plan for
> select *
> from catalog_sales
> ,catalog_returns
> where cs_item_sk = cr_item_sk
> and cs_order_number = cr_order_number;
> --------------
> HashJoin(...): rowcount = 22500.0
> TableScan(table=[[PUBLIC, CATALOG_RETURNS]]): rowcount = 1000000.0
> Exchange(...): <...>
> TableScan(table=[[PUBLIC, CATALOG_SALES]]): rowcount = 1000000.0
> }
> {code}
> When joining two tables with 1kk rows each by primary key, estimated result
> set size is only 22.5k rows. Things get even worse when there is several
> joins with dimensions tables: after a few joins estimated result set is close
> to 1 (a single row).
> Given that we don't support foreign keys, as well as we don't have proper
> statistics yet, we need to introduce heuristics to improve row count
> estimation for joins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)