Yes. I'm interested in this and going to look deeper in it tomorrow. Just filed a JIRA: https://issues.apache.org/jira/browse/IMPALA-8386
On Thu, Apr 4, 2019 at 11:20 PM Todd Lipcon <t...@cloudera.com> wrote: > Sounds like we should file a high priority JIRA (any "wrong results" bugs > should probably be considered critical or blocker). Quanlong, any interest > in working on this issue? > > -Todd > > On Thu, Apr 4, 2019 at 8:17 AM Quanlong Huang <huangquanl...@gmail.com> > wrote: > > > +1 for Csaba's analysis. Looks like similiar to this > > https://issues.apache.org/jira/browse/IMPALA-3126 > > > > On Thu, Apr 4, 2019 at 11:08 PM Csaba Ringhofer < > csringho...@cloudera.com> > > wrote: > > > > > Hi! > > > > > > I have checked the queries, and I can verify that Impala incorrectly > > > returns 1 row while the same query with Hive (or common sense..) > returns > > 2 > > > rows. > > > > > > > "but if remove the "t2.amount2" like this:" > > > Indeed, the issue seems to be related to returning the same aggregate > > twice > > > + the fact that one of these values is NULL. The planner introduces a > > > predicate that checks if amount1=amount2, which is false, if both > values > > > are NULL: > > > > > > explain select * from (select t2.a_id,t2.amount1,t2.amount2 > > > from( select a_id from a) t1 > > > left outer join ( > > > select c.a_id,sum(amount) as amount1,sum(amount) as amount2 > > > from b join c on b.b_id = c.b_id group by c.a_id) t2 > > > on t1.a_id = t2.a_id) t; > > > > > > results in: > > > PLAN-ROOT SINK > > > | > > > 05:HASH JOIN [RIGHT OUTER JOIN] > > > | hash predicates: c.a_id = a_id > > > | other predicates: sum(amount) = sum(amount) <----- I don't know why > > this > > > predicate is added. > > > | runtime filters: RF000 <- a_id > > > | row-size=16B cardinality=2 > > > .... > > > > > > If I EXPLAIN the query without the outer select, the sum(amount) = > > > sum(amount) is not added, which explains the difference. > > > > > > I do not know why the planner adds this predicate, my guess is that > this > > is > > > some kind of bug in Impala. > > > > > > > > > On Thu, Apr 4, 2019 at 2:27 PM skyyws <sky...@163.com> wrote: > > > > > > > Hi all, I met a problem of left outer join recently, and I reproduce > > this > > > > problem by some simple test data with three tables a, b, c: > > > > table A > > > > +------+ > > > > | a_id | > > > > +------+ > > > > | 1 | > > > > | 2 | > > > > +------+ > > > > table B > > > > +------+--------+ > > > > | b_id | amount | > > > > +------+--------+ > > > > | 1 | 10 | > > > > | 1 | 20 | > > > > | 2 | NULL | > > > > +------+--------+ > > > > table C > > > > +------+------+ > > > > | a_id | b_id | > > > > +------+------+ > > > > | 1 | 1 | > > > > | 2 | 2 | > > > > +------+------+ > > > > The sql below: > > > > select count(1) from ( > > > > select t2.a_id,t2.amount1,t2.amount2 > > > > from( select a_id from a) t1 > > > > left outer join ( > > > > select c.a_id,sum(amount) as amount1,sum(amount) as amount2 > > > > from b join c on b.b_id = c.b_id group by c.a_id) t2 > > > > on t1.a_id = t2.a_id > > > > ) t; > > > > +----------+ > > > > | count(1) | > > > > +----------+ > > > > | 1 | > > > > +----------+ > > > > but if remove the "t2.amount2" like this: > > > > select count(1) from ( > > > > select t2.a_id,t2.amount1 > > > > from( select a_id from a) t1 > > > > left outer join ( > > > > select c.a_id,sum(amount) as amount1,sum(amount) as amount2 > > > > from b join c on b.b_id = c.b_id group by c.a_id) t2 > > > > on t1.a_id = t2.a_id > > > > ) t; > > > > +----------+ > > > > | count(1) | > > > > +----------+ > > > > | 2 | > > > > +----------+ > > > > Here is the result of two subquery without count(1): > > > > +------+---------+---------+ > > > > | a_id | amount1 | amount2 | > > > > +------+---------+---------+ > > > > | 1 | 30 | 30 | > > > > | 2 | NULL | NULL | > > > > +------+---------+---------+ why the count(1) of this > > > > resultset is 1? > > > > +------+---------+ > > > > | a_id | amount1 | > > > > +------+---------+ > > > > | 1 | 30 | > > > > | 2 | NULL | > > > > +------+---------+ why the count(1) of this > > > > resultset is 2? > > > > I want to ask why the first sql return just 1, but second return 2,is > > > this > > > > correct or impala bug?How impala deal with count aggr.? > > > > If I change the sum to other aggr. function like count/max/min, > result > > is > > > > same. I test this on 2.12.0 and 3.1.0 version. > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >