[ 
https://issues.apache.org/jira/browse/IMPALA-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051930#comment-18051930
 ] 

Riza Suminto commented on IMPALA-14574:
---------------------------------------

Filed a patch at: [https://gerrit.cloudera.org/c/22926/]

> Lower memory estimate by analyzing Pipeline Membership
> ------------------------------------------------------
>
>                 Key: IMPALA-14574
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14574
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>
> IMPALA-7231 group PlanNodes into a set of Pipelines and display that 
> information in query profile like this:
> {code:java}
> in pipelines: 07(GETNEXT), 01(OPEN) {code}
> A meeting point between GETNEXT and OPEN pipeline is usually a blocking 
> operator, where all PlanNode operators that belongs to GETNEXT pipeline must 
> wait until all operators in OPEN pipeline finish.
>  
> An example of this are HASH JOIN,
> {code:java}
> 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> |  hash-table-id=00
> |  hash predicates: i1.i_manufact = i_manufact
> |  fk/pk conjuncts: none 
> |  other predicates: zeroifnull(count(*)) > CAST(0 AS BIGINT)
> |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=0,2N row-size=90B cardinality=10.20K
> |  in pipelines: 00(GETNEXT), 07(OPEN)
> {code}
>  
> Final AGGREGATION,
> {code:java}
> 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> 10:AGGREGATE [FINALIZE]
> |  group by: (i_product_name)
> |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=4 row-size=32B cardinality=10.20K
> |  in pipelines: 10(GETNEXT), 00(OPEN)
> {code}
>  
> SORT/TOPN,
> {code:java}
> 05:TOP-N [LIMIT=100]
> |  order by: (i_product_name) ASC
> |  mem-estimate=3.10KB mem-reservation=0B thread-reservation=0
> |  tuple-ids=5 row-size=32B cardinality=100
> |  in pipelines: 05(GETNEXT), 10(OPEN)
> {code}
> And so on.
>  
> Currently, Impala estimate memory usage of query by simply adding memory 
> estimate for all query fragments. Impala should able to estimate lower memory 
> by analyzing this pipeline dependencies in query plan tree. Fragments that 
> belongs to GETNEXT pipeline is less likely to consume all of its memory 
> allotment until all OPEN pipelines that adjacent to that GETNEXT pipeline 
> finish.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to