[
https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755771#comment-16755771
]
Sorabh Hamirwasia commented on DRILL-7016:
------------------------------------------
Looked more into the issue and it's *not* because of filter on top of Runtime
Filter operator. While generating RuntimeFilter there is a bug in which left
and right side fields in join condition is decided to be used in BloomFilter.
It doesn't uses the ordinals for right keys and instead directly get the field
name starting with index 0 for each left keys. Hence with changing order of
filter and join condition the ordinals of right fields changes and bloom filter
is generated for wrong right side field. For example: In case 1 below
bloomFilter is generated on right side column c_mktsegment instead of
c_custkey. Whereas in case 2 bloomFilter is generated on right side column
c_custkey.
*Case 1:*
{code:java}
01-04 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) :
rowType = RecordType(ANY o_custkey, ANY c_mktsegment, ANY c_custkey): rowcount
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9
network, 3960000.0000000005 memory}, id = 65202{code}
1 row selected (3.654 seconds)
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*)
. . . . . . . . . . . . . . semicolon> from
. . . . . . . . . . . . . . semicolon> customer c,
. . . . . . . . . . . . . . semicolon> orders o
. . . . . . . . . . . . . . semicolon> where c.c_mktsegment = 'HOUSEHOLD'
. . . . . . . . . . . . . . semicolon> and c.c_custkey = o.o_custkey;{code}
+---------+
| EXPR$0 |
+---------+
| 19826 |
+---------+
*Case 2:*
{code:java}
01-04 HashJoin(condition=[=($1, $0)], joinType=[inner], semi-join: =[false]) :
rowType = RecordType(ANY o_custkey, ANY c_custkey, ANY c_mktsegment): rowcount
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9
network, 3960000.0000000005 memory}, id = 66134{code}
1 row selected (1.328 seconds)
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*)
. . . . . . . . . . . . . . semicolon> from
. . . . . . . . . . . . . . semicolon> customer c,
. . . . . . . . . . . . . . semicolon> orders o
. . . . . . . . . . . . . . semicolon> where c.c_custkey = o.o_custkey and
. . . . . . . . . . . . . . semicolon> c.c_mktsegment = 'HOUSEHOLD'
. . . . . . . . . . . . . . semicolon> ;{code}
+----------+
| EXPR$0 |
+----------+
| 2990828 |
+----------+
> Wrong query result with RuntimeFilter enabled when order of join and filter
> condition is swapped
> ------------------------------------------------------------------------------------------------
>
> Key: DRILL-7016
> URL: https://issues.apache.org/jira/browse/DRILL-7016
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.16.0
> Reporter: Sorabh Hamirwasia
> Assignee: Sorabh Hamirwasia
> Priority: Major
> Fix For: 1.16.0
>
>
> Below 2 queries generate different results:
> *Query1: Result: 19826*
> {code:java}
> select count(*)
> from
> customer c,
> orders o
> where
> c.c_mktsegment = 'HOUSEHOLD'
> and c.c_custkey = o.o_custkey
> {code}
> *Query2: Result: 2990828*
> {code:java}
> select count(*)
> from
> customer c,
> orders o
> where
> c.c_custkey = o.o_custkey and
> c.c_mktsegment = 'HOUSEHOLD'
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)