[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712244#comment-14712244
 ] 

Hao Zhu commented on DRILL-3710:
--------------------------------

a. No optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02        Project($f0=[1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[OR(=(CAST($0):INTEGER, 1), 
=(CAST($0):INTEGER, 2), =(CAST($0):INTEGER, 3), =(CAST($0):INTEGER, 4), 
=(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER, 7), 
=(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), 
=(CAST($0):INTEGER, 11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), 
=(CAST($0):INTEGER, 14), =(CAST($0):INTEGER, 15), =(CAST($0):INTEGER, 16), 
=(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER, 19))])
00-05              Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}
b. With optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02        Project($f0=[1])
00-03          Project(f6=[$1], ROW_VALUE=[$0])
00-04            MergeJoin(condition=[=($1, $0)], joinType=[inner])
00-06              SelectionVectorRemover
00-08                Sort(sort0=[$0], dir0=[ASC])
00-10                  HashAgg(group=[{0}])
00-12                    Values
00-05              SelectionVectorRemover
00-07                Sort(sort0=[$0], dir0=[ASC])
00-09                  Project(f6=[CAST($0):INTEGER])
00-11                    Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}

> Make the 20 in-list optimization configurable
> ---------------------------------------------
>
>                 Key: DRILL-3710
>                 URL: https://issues.apache.org/jira/browse/DRILL-3710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Hao Zhu
>            Assignee: Jinfeng Ni
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to