Sounds like we should file a high priority JIRA (any "wrong results" bugs
should probably be considered critical or blocker). Quanlong, any interest
in working on this issue?

-Todd

On Thu, Apr 4, 2019 at 8:17 AM Quanlong Huang <huangquanl...@gmail.com>
wrote:

> +1 for Csaba's analysis. Looks like similiar to this
> https://issues.apache.org/jira/browse/IMPALA-3126
>
> On Thu, Apr 4, 2019 at 11:08 PM Csaba Ringhofer <csringho...@cloudera.com>
> wrote:
>
> > Hi!
> >
> > I have checked the queries, and I can verify that Impala incorrectly
> > returns 1 row while the same query with Hive (or common sense..) returns
> 2
> > rows.
> >
> > > "but if remove the "t2.amount2"  like this:"
> > Indeed, the issue seems to be related to returning the same aggregate
> twice
> > + the fact that one of these values is NULL. The planner introduces a
> > predicate that checks if amount1=amount2, which is false, if both values
> > are NULL:
> >
> > explain select * from (select t2.a_id,t2.amount1,t2.amount2
> >     from( select a_id from a) t1
> >     left outer join (
> >         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
> >         from b join c  on b.b_id = c.b_id group by c.a_id) t2
> >     on t1.a_id = t2.a_id) t;
> >
> > results in:
> > PLAN-ROOT SINK
> > |
> > 05:HASH JOIN [RIGHT OUTER JOIN]
> > |  hash predicates: c.a_id = a_id
> > |  other predicates: sum(amount) = sum(amount) <----- I don't know why
> this
> > predicate is added.
> > |  runtime filters: RF000 <- a_id
> > |  row-size=16B cardinality=2
> > ....
> >
> > If I EXPLAIN the query without the outer select, the sum(amount) =
> > sum(amount) is not added, which explains the difference.
> >
> > I do not know why the planner adds this predicate, my guess is that this
> is
> > some kind of bug in Impala.
> >
> >
> > On Thu, Apr 4, 2019 at 2:27 PM skyyws <sky...@163.com> wrote:
> >
> > > Hi all, I met a problem of left outer join recently, and I reproduce
> this
> > > problem by some simple test data with three tables a, b, c:
> > > table A
> > > +------+
> > > | a_id |
> > > +------+
> > > | 1    |
> > > | 2    |
> > > +------+
> > > table B
> > > +------+--------+
> > > | b_id | amount |
> > > +------+--------+
> > > | 1    | 10     |
> > > | 1    | 20     |
> > > | 2    | NULL   |
> > > +------+--------+
> > > table C
> > > +------+------+
> > > | a_id | b_id |
> > > +------+------+
> > > | 1    | 1    |
> > > | 2    | 2    |
> > > +------+------+
> > > The sql below:
> > > select count(1) from (
> > >     select t2.a_id,t2.amount1,t2.amount2
> > >     from( select a_id from a) t1
> > >     left outer join (
> > >         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
> > >         from b join c  on b.b_id = c.b_id group by c.a_id) t2
> > >     on t1.a_id = t2.a_id
> > > ) t;
> > > +----------+
> > > | count(1) |
> > > +----------+
> > > | 1        |
> > > +----------+
> > > but if remove the "t2.amount2"  like this:
> > > select count(1) from (
> > >     select t2.a_id,t2.amount1
> > >     from( select a_id from a) t1
> > >     left outer join (
> > >         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
> > >         from b join c  on b.b_id = c.b_id group by c.a_id) t2
> > >     on t1.a_id = t2.a_id
> > > ) t;
> > > +----------+
> > > | count(1) |
> > > +----------+
> > > | 2        |
> > > +----------+
> > > Here is the result of two subquery without count(1):
> > > +------+---------+---------+
> > > | a_id | amount1 | amount2 |
> > > +------+---------+---------+
> > > | 1    | 30      | 30      |
> > > | 2    | NULL    | NULL    |
> > > +------+---------+---------+             why the count(1) of this
> > > resultset is 1?
> > > +------+---------+
> > > | a_id | amount1 |
> > > +------+---------+
> > > | 1    | 30      |
> > > | 2    | NULL    |
> > > +------+---------+                           why the count(1) of this
> > > resultset is 2?
> > > I want to ask why the first sql return just 1, but second return 2,is
> > this
> > > correct or impala bug?How impala deal with count aggr.?
> > > If I change the sum to other aggr. function like count/max/min, result
> is
> > > same. I test this on 2.12.0 and 3.1.0 version.
> > >
> > >
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to