[
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pooja Nilangekar resolved IMPALA-7749.
--------------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
> Merge aggregation node memory estimate is incorrectly influenced by limit
> -------------------------------------------------------------------------
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
> Reporter: Tim Armstrong
> Assignee: Pooja Nilangekar
> Priority: Critical
> Fix For: Impala 3.1.0
>
>
> In the below query the estimate for node ID 3 is too low. If you remove the
> limit it is correct.
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey,
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from
> tpch.lineitem group by 1, 2, 3 limit 5
> +-------------------------------------------------------------------------------------------+
> | Explain String
> |
> +-------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4
> |
> | Per-Host Resource Estimates: Memory=450MB
> |
> |
> |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |
> | | Per-Host Resources: mem-estimate=0B mem-reservation=0B
> thread-reservation=1 |
> | PLAN-ROOT SINK
> |
> | | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> | |
> |
> | 04:EXCHANGE [UNPARTITIONED]
> |
> | | limit: 5
> |
> | | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> | | tuple-ids=1 row-size=28B cardinality=5
> |
> | | in pipelines: 03(GETNEXT)
> |
> | |
> |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3
> instances=3 |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB
> thread-reservation=1 |
> | 03:AGGREGATE [FINALIZE]
> |
> | | output: count:merge(*)
> |
> | | group by: l_orderkey, l_partkey, l_linenumber
> |
> | | limit: 5
> |
> | | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0 |
> | | tuple-ids=1 row-size=28B cardinality=5
> |
> | | in pipelines: 03(GETNEXT), 00(OPEN)
> |
> | |
> |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]
> |
> | | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> | | tuple-ids=1 row-size=28B cardinality=6001215
> |
> | | in pipelines: 00(GETNEXT)
> |
> | |
> |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
> |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB
> thread-reservation=2 |
> | 01:AGGREGATE [STREAMING]
> |
> | | output: count(*)
> |
> | | group by: l_orderkey, l_partkey, l_linenumber
> |
> | | mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | | tuple-ids=1 row-size=28B cardinality=6001215
> |
> | | in pipelines: 00(GETNEXT)
> |
> | |
> |
> | 00:SCAN HDFS [tpch.lineitem, RANDOM]
> |
> | partitions=1/1 files=1 size=718.94MB
> |
> | stored statistics:
> |
> | table: rows=6001215 size=718.94MB
> |
> | columns: all
> |
> | extrapolated-rows=disabled max-scan-range-rows=1068457
> |
> | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1
> |
> | tuple-ids=0 row-size=20B cardinality=6001215
> |
> | in pipelines: 00(GETNEXT)
> |
> +-------------------------------------------------------------------------------------------+
> {noformat}
> The bug is that we use cardinality_ to cap the number of distinct values, but
> cardinality_ is capped at the output limit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]