[ 
https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755771#comment-16755771
 ] 

Sorabh Hamirwasia commented on DRILL-7016:
------------------------------------------

Looked more into the issue and it's *not* because of filter on top of Runtime 
Filter operator. While generating RuntimeFilter there is a bug in which left 
and right side fields in join condition is decided to be used in BloomFilter. 
It doesn't uses the ordinals for right keys and instead directly get the field 
name starting with index 0 for each left keys. Hence with changing order of 
filter and join condition the ordinals of right fields changes and bloom filter 
is generated for wrong right side field. For example: In case 1 below 
bloomFilter is generated on right side column c_mktsegment instead of 
c_custkey. Whereas in case 2 bloomFilter is generated on right side column 
c_custkey.

 
*Case 1:*
{code:java}
01-04 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY o_custkey, ANY c_mktsegment, ANY c_custkey): rowcount 
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 
network, 3960000.0000000005 memory}, id = 65202{code}
1 row selected (3.654 seconds)
 
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*) 
. . . . . . . . . . . . . . semicolon> from 
. . . . . . . . . . . . . . semicolon> customer c, 
. . . . . . . . . . . . . . semicolon> orders o 
. . . . . . . . . . . . . . semicolon> where c.c_mktsegment = 'HOUSEHOLD'
. . . . . . . . . . . . . . semicolon> and c.c_custkey = o.o_custkey;{code}
 
+---------+
| EXPR$0  |
+---------+
| 19826   |
+---------+
 
*Case 2:*
{code:java}
01-04 HashJoin(condition=[=($1, $0)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY o_custkey, ANY c_custkey, ANY c_mktsegment): rowcount 
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 
network, 3960000.0000000005 memory}, id = 66134{code}
1 row selected (1.328 seconds)
 
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*)
. . . . . . . . . . . . . . semicolon> from 
. . . . . . . . . . . . . . semicolon> customer c,
. . . . . . . . . . . . . . semicolon> orders o
. . . . . . . . . . . . . . semicolon> where c.c_custkey = o.o_custkey and 
. . . . . . . . . . . . . . semicolon> c.c_mktsegment = 'HOUSEHOLD' 
. . . . . . . . . . . . . . semicolon> ;{code}
 
+----------+
|  EXPR$0  |
+----------+
| 2990828  |
+----------+

> Wrong query result with RuntimeFilter enabled when order of join and filter 
> condition is swapped
> ------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7016
>                 URL: https://issues.apache.org/jira/browse/DRILL-7016
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.16.0
>            Reporter: Sorabh Hamirwasia
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Below 2 queries generate different results:
> *Query1: Result: 19826* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_mktsegment = 'HOUSEHOLD' 
>   and c.c_custkey = o.o_custkey 
> {code}
> *Query2: Result: 2990828* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_custkey = o.o_custkey and
>   c.c_mktsegment = 'HOUSEHOLD' 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to