[ 
https://issues.apache.org/jira/browse/IMPALA-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914375#comment-17914375
 ] 

ASF subversion and git services commented on IMPALA-2945:
---------------------------------------------------------

Commit 3118e41c26f730a06d42994e678cab694c787649 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3118e41c2 ]

IMPALA-2945: Account for duplicate keys on multiple nodes preAgg

AggregationNode.computeStats() estimate cardinality under single node
assumption. This can be an underestimation in preaggregation node case
because same grouping key may exist in multiple nodes during
preaggreation.

This patch adjust the cardinality estimate using following model for the
number of distinct values in a random sample of k rows, previously used
to calculate ProcessingCost model by IMPALA-12657 and IMPALA-13644.

Assumes we are picking k rows from an infinite sample with ndv distinct
values, with the value uniformly distributed. The probability of a given
value not appearing in a sample, in that case is

((NDV - 1) / NDV) ^ k

This is because we are making k choices, and each of them has
(ndv - 1) / ndv chance of not being our value. Therefore the
probability of a given value appearing in the sample is:

1 - ((NDV - 1) / NDV) ^ k

And the number of distinct values in the sample is:

(1 - ((NDV - 1) / NDV) ^ k) * NDV

Query option ESTIMATE_DUPLICATE_IN_PREAGG is added to control whether to
use the new estimation logic or not.

Testing:
- Pass core tests.

Change-Id: I04c563e59421928875b340cb91654b9d4bc80b55
Reviewed-on: http://gerrit.cloudera.org:8080/22047
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Pre-aggregation cardinality estimates do not take into account data 
> distribution
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-2945
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2945
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.0, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, 
> Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 
> 2.11.0, Impala 2.12.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: planner, resource-management
>         Attachments: ESTIMATE_DUPLICATE_IN_PREAGG_FALSE.txt, 
> ESTIMATE_DUPLICATE_IN_PREAGG_TRUE.txt, tpcds_q4_baseline.txt
>
>
> When computing the per-host memory estimate for local aggregations, the 
> planner does not take into account that data is randomly distributed across 
> nodes leading to significant underestimation in some cases. The suggested fix 
> is to use min(agg input cardinality, NDV * #hosts) as the per-node 
> cardinality used for the per-node memory estimate.
> Impact: In the query below, the planner significantly underestimates the 
> per-node memory of agg node 03 to be 3.8GB but the actual is 24.77.
> Query
> {code}
> select sum(l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem,
>   part
> where
>   p_partkey = l_partkey
>   and p_brand = 'Brand#23'
>   and p_container = 'MED BOX'
>   and l_quantity < (
>     select
>       0.2 * avg(l_quantity)
>     from
>       lineitem
>     where
>       l_partkey = p_partkey
>   )
> {code}
> Plan
> {code}
> 12:AGGREGATE [FINALIZE]
> |  output: sum:merge(l_extendedprice)
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 11:EXCHANGE [UNPARTITIONED]
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 06:AGGREGATE
> |  output: sum(l_extendedprice)
> |  hosts=20 per-host-mem=10.00MB
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]
> |  hash predicates: l_partkey = p_partkey
> |  other join predicates: l_quantity < 0.2 * avg(l_quantity)
> |  hosts=20 per-host-mem=125.18MB
> |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |
> |--10:EXCHANGE [HASH(p_partkey)]
> |  |  hosts=20 per-host-mem=0B
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  04:HASH JOIN [INNER JOIN, BROADCAST]
> |  |  hash predicates: l_partkey = p_partkey
> |  |  hosts=20 per-host-mem=58.30MB
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  |--09:EXCHANGE [BROADCAST]
> |  |  |  hosts=20 per-host-mem=0B
> |  |  |  tuple-ids=1 row-size=56B cardinality=1000000
> |  |  |
> |  |  01:SCAN HDFS [tpch_1000_decimal_parquet.part, RANDOM]
> |  |     partitions=1/1 files=40 size=6.38GB
> |  |     predicates: p_brand = 'Brand#23', p_container = 'MED BOX'
> |  |     table stats: 200000000 rows total
> |  |     column stats: all
> |  |     hosts=20 per-host-mem=264.00MB
> |  |     tuple-ids=1 row-size=56B cardinality=1000000
> |  |
> |  00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
> |     partitions=1/1 files=880 size=216.61GB
> |     table stats: 5999989709 rows total
> |     column stats: all
> |     hosts=20 per-host-mem=264.00MB
> |     tuple-ids=0 row-size=24B cardinality=5999989709
> |
> 08:AGGREGATE [FINALIZE]
> |  output: avg:merge(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=167.89MB
> |  tuple-ids=4 row-size=16B cardinality=200052064
> |
> 07:EXCHANGE [HASH(l_partkey)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 03:AGGREGATE
> |  output: avg(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=3.28GB
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 02:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
>    partitions=1/1 files=880 size=216.61GB
>    table stats: 5999989709 rows total
>    column stats: all
>    hosts=20 per-host-mem=176.00MB
>    tuple-ids=2 row-size=16B cardinality=5999989709
> {code}
> Summary
> |Operator             ||#Hosts||   Avg Time||   Max Time||    #Rows  ||Est. 
> #Rows||   Peak Mem  ||Est. Peak Mem  ||Detail                   |      
> |12:AGGREGATE              |1  |256.620ms|  256.620ms|        1|           1| 
>   92.00 KB|        -1.00 B|  FINALIZE                       |
> |11:EXCHANGE               |1  |184.430us|  184.430us|       20|           1| 
>          0|        -1.00 B|  UNPARTITIONED                  |
> |06:AGGREGATE             |20  |364.045ms|    1s508ms|       20|           1| 
>    9.37 MB|       10.00 MB|                                 |
> |05:HASH JOIN             |20  |279.175ms|  304.600ms|  523.09K|      29.99M| 
>  155.04 MB|      125.18 MB|  RIGHT SEMI JOIN, PARTITIONED   |
> |I--10:EXCHANGE           |20   |22.448ms|   32.954ms|    5.98M|      29.99M| 
>          0|              0|  HASH(p_partkey)                |
> |I  04:HASH JOIN          |20   |25s417ms|   35s579ms|    5.98M|      29.99M| 
>  146.02 MB|       58.30 MB|  INNER JOIN, BROADCAST          |
> |I  I--09:EXCHANGE        |20   |16.270ms|   35.329ms|  199.30K|       1.00M| 
>          0|              0|  BROADCAST                     | 
> |I  I  01:SCAN HDFS       |20  |218.505ms|  331.299ms|  199.30K|       1.00M| 
>  173.43 MB|      264.00 MB|  tpch_1000_decimal_parquet.part| 
> |I  00:SCAN HDFS          |20    |1s365ms|    1s822ms|    6.00B|       6.00B| 
>    1.92 GB|      264.00 MB|  tpch_1000_decimal_parquet.l... |
> |08:AGGREGATE             |20   |29s706ms|   35s917ms|  200.00M|     200.05M| 
>    1.64 GB|      167.89 MB|  FINALIZE|                       
> |07:EXCHANGE              |20    |5s081ms|    8s410ms|    3.11B|     200.05M| 
>          0|              0|  HASH(l_partkey)|                
> |03:AGGREGATE             |20      |4m10s      5m12s |   3.11B|     200.05M|  
> 24.77 GB|        3.28 GB|                                 
> |02:SCAN HDFS             |20    |1s544ms|    2s517ms|    6.00B|       6.00B| 
>  838.85 MB|      176.00 MB|  tpch_1000_decimal_parquet.l... |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to