[ 
https://issues.apache.org/jira/browse/IMPALA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032950#comment-17032950
 ] 

ASF subversion and git services commented on IMPALA-8386:
---------------------------------------------------------

Commit d971b19dee289d8fe54eebf484cb14ee6d56f60e in impala's branch 
refs/heads/master from Kurt Deschler
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d971b19 ]

IMPALA-9358: Query slowdown with inline views and hundreds of columns

IMPALA-8386 introduced an expensive precondition check using the function
ExprSubstitutionMap.checkComposedFrom(). This check has significant
performance impact on statements that contain inline views with hundreds
of columns. Most of the cost is in the get() calls used to find
expressions in the local substitution map.

The fix is to add a getWithHint() call that uses the current loop index as a
starting point to search for expressions. This leverages the fact that
expressions have identical positions in both substitution maps in most
common cases.

A more generic approach would be to accelerate expression equality search
using hash functions but that would be a much riskier fix and Impala
currently lacks the infrasturucture to so.

Testing:
Performance testing with a query with 1000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;

repro query went from 12 sec to 1 sec.
There was no noticeable time spent in the precondition now.

Change-Id: I77423d9c10e1edbb505cb210b5c072281b5d7cfc
Reviewed-on: http://gerrit.cloudera.org:8080/15157
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Incorrect predicate in a left outer join query
> ----------------------------------------------
>
>                 Key: IMPALA-8386
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8386
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.2.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: correctness
>             Fix For: Impala 3.3.0
>
>
> skyyws <[email protected]> reported a bug [in the mailing 
> list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E]
>  on the following data set:
> {code:java}
> table A
> +------+
> | a_id |
> +------+
> | 1    |
> | 2    |
> +------+
> table B
> +------+--------+
> | b_id | amount |
> +------+--------+
> | 1    | 10     |
> | 1    | 20     |
> | 2    | NULL   |
> +------+--------+
> table C
> +------+------+
> | a_id | b_id |
> +------+------+
> | 1    | 1    |
> | 2    | 2    |
> +------+------+{code}
> The following query returns a wrong result "1":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1,t2.amount2
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> Removing "t2.amount2" can get the right result "2":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> The problem is that in query 1, Impala generates a wrong predicate 
> "sum(amount) = sum(amount)" which rejects nulls:
> {code:java}
> +-------------------------------------------------------------+
> | Explain String                                              |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
> | Per-Host Resource Estimates: Memory=143MB                   |
> | Codegen disabled by planner                                 |
> |                                                             |
> | PLAN-ROOT SINK                                              |
> | |                                                           |
> | 12:AGGREGATE [FINALIZE]                                     |
> | |  output: count:merge(*)                                   |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 11:EXCHANGE [UNPARTITIONED]                                 |
> | |                                                           |
> | 06:AGGREGATE                                                |
> | |  output: count(*)                                         |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
> | |  hash predicates: c.a_id = a_id                           |
> | |  other predicates: sum(amount) = sum(amount)      <---------- Wrong 
> inferred predicate which incorrectly reject nulls
> | |  runtime filters: RF000 <- a_id                           |
> | |  row-size=16B cardinality=2                               |
> | |                                                           |
> | |--10:EXCHANGE [HASH(a_id)]                                 |
> | |  |                                                        |
> | |  00:SCAN HDFS [default.a]                                 |
> | |     partitions=1/1 files=1 size=4B                        |
> | |     row-size=4B cardinality=2                             |
> | |                                                           |
> | 09:AGGREGATE [FINALIZE]                                     |
> | |  output: sum:merge(amount)                                |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 08:EXCHANGE [HASH(c.a_id)]                                  |
> | |                                                           |
> | 04:AGGREGATE [STREAMING]                                    |
> | |  output: sum(amount)                                      |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
> | |  hash predicates: b.b_id = c.b_id                         |
> | |  runtime filters: RF002 <- c.b_id                         |
> | |  row-size=16B cardinality=3                               |
> | |                                                           |
> | |--07:EXCHANGE [BROADCAST]                                  |
> | |  |                                                        |
> | |  02:SCAN HDFS [default.c]                                 |
> | |     partitions=1/1 files=1 size=8B                        |
> | |     runtime filters: RF000 -> default.c.a_id              |
> | |     row-size=8B cardinality=2                             |
> | |                                                           |
> | 01:SCAN HDFS [default.b]                                    |
> |    partitions=1/1 files=1 size=15B                          |
> |    runtime filters: RF002 -> b.b_id                         |
> |    row-size=8B cardinality=3                                |
> +-------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to