[jira] [Assigned] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

Gautam Parai (JIRA) Thu, 21 Mar 2019 15:42:42 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gautam Parai reassigned DRILL-7129:
-----------------------------------

    Assignee: Gautam Parai

> Join with more than 1 condition is not using stats to compute row count 
> estimate
> --------------------------------------------------------------------------------
>
>                 Key: DRILL-7129
>                 URL: https://issues.apache.org/jira/browse/DRILL-7129
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Anisha Reddy
>            Assignee: Gautam Parai
>            Priority: Major
>             Fix For: 1.17.0
>
>
> Below are the details: 
>  
> {code:java}
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem`; +---------+ | EXPR$0 | +---------+ | 
> 57068 | +---------+ 1 row selected (0.179 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/partsupp`; +---------+ | EXPR$0 | +---------+ | 
> 7474 | +---------+ 1 row selected (0.171 seconds) 
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; +---------+ | EXPR$0 | +---------+ | 53401 | 
> +---------+ 1 row selected (0.769 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
> for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; 
> +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
>  | text | json | 
> +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
>  | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
> rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
> io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
> **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount 
> = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
> T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
> T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 
> cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
> HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
> =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
> l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
> l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
> ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = 
> {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 
> 107572 00-07 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 
> rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 
> Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = 
> RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 
> network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 
> rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
> {code}
> The ndv for l_partkey = 2000
> ps_partkey = 1817
> l_supkey = 100
> ps_suppkey = 100 
> We see that such joins is just taking the max of left side and the right side 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

Reply via email to