[
https://issues.apache.org/jira/browse/IMPALA-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051930#comment-18051930
]
Riza Suminto commented on IMPALA-14574:
---------------------------------------
Filed a patch at: [https://gerrit.cloudera.org/c/22926/]
> Lower memory estimate by analyzing Pipeline Membership
> ------------------------------------------------------
>
> Key: IMPALA-14574
> URL: https://issues.apache.org/jira/browse/IMPALA-14574
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
>
> IMPALA-7231 group PlanNodes into a set of Pipelines and display that
> information in query profile like this:
> {code:java}
> in pipelines: 07(GETNEXT), 01(OPEN) {code}
> A meeting point between GETNEXT and OPEN pipeline is usually a blocking
> operator, where all PlanNode operators that belongs to GETNEXT pipeline must
> wait until all operators in OPEN pipeline finish.
>
> An example of this are HASH JOIN,
> {code:java}
> 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> | hash-table-id=00
> | hash predicates: i1.i_manufact = i_manufact
> | fk/pk conjuncts: none
> | other predicates: zeroifnull(count(*)) > CAST(0 AS BIGINT)
> | mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=0,2N row-size=90B cardinality=10.20K
> | in pipelines: 00(GETNEXT), 07(OPEN)
> {code}
>
> Final AGGREGATION,
> {code:java}
> 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> 10:AGGREGATE [FINALIZE]
> | group by: (i_product_name)
> | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=4 row-size=32B cardinality=10.20K
> | in pipelines: 10(GETNEXT), 00(OPEN)
> {code}
>
> SORT/TOPN,
> {code:java}
> 05:TOP-N [LIMIT=100]
> | order by: (i_product_name) ASC
> | mem-estimate=3.10KB mem-reservation=0B thread-reservation=0
> | tuple-ids=5 row-size=32B cardinality=100
> | in pipelines: 05(GETNEXT), 10(OPEN)
> {code}
> And so on.
>
> Currently, Impala estimate memory usage of query by simply adding memory
> estimate for all query fragments. Impala should able to estimate lower memory
> by analyzing this pipeline dependencies in query plan tree. Fragments that
> belongs to GETNEXT pipeline is less likely to consume all of its memory
> allotment until all OPEN pipelines that adjacent to that GETNEXT pipeline
> finish.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]