[
https://issues.apache.org/jira/browse/IGNITE-23969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Stanilovsky reassigned IGNITE-23969:
-------------------------------------------
Assignee: Evgeny Stanilovsky
> Sql. Improve predicate selectivity estimation
> ---------------------------------------------
>
> Key: IGNITE-23969
> URL: https://issues.apache.org/jira/browse/IGNITE-23969
> Project: Ignite
> Issue Type: Improvement
> Components: sql
> Reporter: Konstantin Orlov
> Assignee: Evgeny Stanilovsky
> Priority: Major
> Labels: ignite-3
>
> Let's take a look at query 48 plan from TPS-DS suite:
> {code:java}
> -- start query 1 in stream 0 using template query48.tpl and seed 622697896
> select sum (ss_quantity)
> from store_sales, store, customer_demographics, customer_address, date_dim
> where s_store_sk = ss_store_sk
> and ss_sold_date_sk = d_date_sk and d_year = 1998
> and
> (
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = 'Unknown'
> and
> ss_sales_price between 100.00 and 150.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'W'
> and
> cd_education_status = 'College'
> and
> ss_sales_price between 50.00 and 100.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'D'
> and
> cd_education_status = 'Primary'
> and
> ss_sales_price between 150.00 and 200.00
> )
> )
> and
> (
> (
> ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('MI', 'GA', 'NH')
> and ss_net_profit between 0 and 2000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('TX', 'KY', 'SD')
> and ss_net_profit between 150 and 3000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('NY', 'OH', 'FL')
> and ss_net_profit between 50 and 25000
> )
> )
> ;
> -- end query 1 in stream 0 using template query48.tpl
> ColocatedHashAggregate
> Project
> NestedLoopJoin
> HashJoin
> Project(SS_SOLD_DATE_SK=[$30], SS_SOLD_TIME_SK=[$31],
> SS_ITEM_SK=[$32], SS_CUSTOMER_SK=[$33], SS_CDEMO_SK=[$34], SS_HDEMO_SK=[$35],
> SS_ADDR_SK=[$36], SS_STORE_SK=[$37], SS_PROMO_SK=[$38],
> SS_TICKET_NUMBER=[$39], SS_QUANTITY=[$40], SS_WHOLESALE_COST=[$41],
> SS_LIST_PRICE=[$42], SS_SALES_PRICE=[$43], SS_EXT_DISCOUNT_AMT=[$44],
> SS_EXT_SALES_PRICE=[$45], SS_EXT_WHOLESALE_COST=[$46],
> SS_EXT_LIST_PRICE=[$47], SS_EXT_TAX=[$48], SS_COUPON_AMT=[$49],
> SS_NET_PAID=[$50], SS_NET_PAID_INC_TAX=[$51], SS_NET_PROFIT=[$52],
> __part=[$53], S_STORE_SK=[$0], S_STORE_ID=[$1], S_REC_START_DATE=[$2],
> S_REC_END_DATE=[$3], S_CLOSED_DATE_SK=[$4], S_STORE_NAME=[$5],
> S_NUMBER_EMPLOYEES=[$6], S_FLOOR_SPACE=[$7], S_HOURS=[$8], S_MANAGER=[$9],
> S_MARKET_ID=[$10], S_GEOGRAPHY_CLASS=[$11], S_MARKET_DESC=[$12],
> S_MARKET_MANAGER=[$13], S_DIVISION_ID=[$14], S_DIVISION_NAME=[$15],
> S_COMPANY_ID=[$16], S_COMPANY_NAME=[$17], S_STREET_NUMBER=[$18],
> S_STREET_NAME=[$19], S_STREET_TYPE=[$20], S_SUITE_NUMBER=[$21], S_CITY=[$22],
> S_COUNTY=[$23], S_STATE=[$24], S_ZIP=[$25], S_COUNTRY=[$26],
> S_GMT_OFFSET=[$27], S_TAX_PERCENTAGE=[$28], __part1=[$29], CD_DEMO_SK=[$54],
> CD_GENDER=[$55], CD_MARITAL_STATUS=[$56], CD_EDUCATION_STATUS=[$57],
> CD_PURCHASE_ESTIMATE=[$58], CD_CREDIT_RATING=[$59], CD_DEP_COUNT=[$60],
> CD_DEP_EMPLOYED_COUNT=[$61], CD_DEP_COLLEGE_COUNT=[$62], __part0=[$63]):
> rowcount = 1000000.0, cumulative cost = IgniteCost
> [rowCount=7078690.5751953125, cpu=2.5502213524414062E7,
> memory=8725611.3515625, io=1.0, network=1.260006113515625E8], id = 46738
> HashJoin
> Exchange
> TableScan(table=[[PUBLIC, STORE]])
> NestedLoopJoin
> Exchange
> TableScan(
> table=[[PUBLIC, STORE_SALES]],
> filters=[AND(>=(CAST($t22):DECIMAL(10, 2), 0.00),
> <=(CAST($t22):DECIMAL(10, 2), 25000.00), >=($t13, 50.00:DECIMAL(5, 2)),
> <=($t13, 200.00:DECIMAL(5, 2)))]
> ): rowcount = 62500.0, cumulative cost = IgniteCost
> [rowCount=1000000.0], id = 46732
> Exchange
> TableScan(
> table=[[PUBLIC, CUSTOMER_DEMOGRAPHICS]],
> filters=[AND(OR(=($t2, _UTF-8'D'), =($t2, _UTF-8'M'),
> =($t2, _UTF-8'W')), OR(=($t2, _UTF-8'M'), =($t2, _UTF-8'W'), =($t3,
> _UTF-8'Primary')), OR(=($t2, _UTF-8'D'), =($t2, _UTF-8'M'), =($t3,
> _UTF-8'College')), OR(=($t2, _UTF-8'M'), =($t3, _UTF-8'College'), =($t3,
> _UTF-8'Primary')), OR(=($t3, _UTF-8'Unknown'), =($t2, _UTF-8'D'), =($t2,
> _UTF-8'W')), OR(=($t3, _UTF-8'Primary'), =($t3, _UTF-8'Unknown'), =($t2,
> _UTF-8'W')), OR(=($t3, _UTF-8'College'), =($t3, _UTF-8'Unknown'), =($t2,
> _UTF-8'D')), OR(=($t3, _UTF-8'College'), =($t3, _UTF-8'Primary'), =($t3,
> _UTF-8'Unknown')))]
> ): rowcount = 15.2587890625, cumulative cost = IgniteCost
> [rowCount=1000000.0], id = 46734
> Exchange
> TableScan(table=[[PUBLIC, DATE_DIM]])
> Exchange
> TableScan(
> table=[[PUBLIC, CUSTOMER_ADDRESS]],
> filters=[AND(=($t10, _UTF-8'United States'), OR(IS NOT NULL($t10),
> =($t8, _UTF-8'FL'), =($t8, _UTF-8'NY'), =($t8, _UTF-8'OH')), OR(IS NOT
> NULL($t10), =($t8, _UTF-8'KY'), =($t8, _UTF-8'SD'), =($t8, _UTF-8'TX')),
> OR(IS NOT NULL($t10), =($t8, _UTF-8'FL'), =($t8, _UTF-8'KY'), =($t8,
> _UTF-8'NY'), =($t8, _UTF-8'OH'), =($t8, _UTF-8'SD'), =($t8, _UTF-8'TX')),
> OR(=($t8, _UTF-8'GA'), =($t8, _UTF-8'MI'), =($t8, _UTF-8'NH'), IS NOT
> NULL($t10)), OR(=($t8, _UTF-8'FL'), =($t8, _UTF-8'GA'), =($t8, _UTF-8'MI'),
> =($t8, _UTF-8'NH'), =($t8, _UTF-8'NY'), =($t8, _UTF-8'OH'), IS NOT
> NULL($t10)), OR(=($t8, _UTF-8'GA'), =($t8, _UTF-8'KY'), =($t8, _UTF-8'MI'),
> =($t8, _UTF-8'NH'), =($t8, _UTF-8'SD'), =($t8, _UTF-8'TX'), IS NOT
> NULL($t10)), OR(=($t8, _UTF-8'FL'), =($t8, _UTF-8'GA'), =($t8, _UTF-8'KY'),
> =($t8, _UTF-8'MI'), =($t8, _UTF-8'NH'), =($t8, _UTF-8'NY'), =($t8,
> _UTF-8'OH'), =($t8, _UTF-8'SD'), =($t8, _UTF-8'TX')))]
> ): rowcount = 9.1552734375, cumulative cost = IgniteCost
> [rowCount=1000000.0], id = 46742
> {code}
> For CUSTOMER_DEMOGRAPHICS, it's estimated that only 15 rows out of 1kk will
> satisfy the predicate. The same with CUSTOMER_ADDRESS: 9 rows out of 1kk.
> We need to improve this estimation somehow in order to help optimizer to
> choose more optimal plans.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)