Riza Suminto created IMPALA-14574:
-------------------------------------

             Summary: Lower memory estimate by analyzing Pipeline Membership
                 Key: IMPALA-14574
                 URL: https://issues.apache.org/jira/browse/IMPALA-14574
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Riza Suminto
            Assignee: Riza Suminto


IMPALA-7231 group PlanNodes into a set of Pipelines and display that 
information in query profile like this:
{code:java}
in pipelines: 07(GETNEXT), 01(OPEN) {code}
A meeting point between GETNEXT and OPEN pipeline is usually a blocking 
operator, where all PlanNode operators that belongs to GETNEXT pipeline must 
wait until all operators in OPEN pipeline finish.

 

An example of this are HASH JOIN,
{code:java}
03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
|  hash-table-id=00
|  hash predicates: i1.i_manufact = i_manufact
|  fk/pk conjuncts: none 
|  other predicates: zeroifnull(count(*)) > CAST(0 AS BIGINT)
|  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
|  tuple-ids=0,2N row-size=90B cardinality=10.20K
|  in pipelines: 00(GETNEXT), 07(OPEN)
{code}
 

Final AGGREGATION,
{code:java}
03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
10:AGGREGATE [FINALIZE]
|  group by: (i_product_name)
|  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  tuple-ids=4 row-size=32B cardinality=10.20K
|  in pipelines: 10(GETNEXT), 00(OPEN)
{code}
 
SORT/TOPN,
{code:java}
05:TOP-N [LIMIT=100]
|  order by: (i_product_name) ASC
|  mem-estimate=3.10KB mem-reservation=0B thread-reservation=0
|  tuple-ids=5 row-size=32B cardinality=100
|  in pipelines: 05(GETNEXT), 10(OPEN)
{code}
And so on.

 

Currently, Impala estimate memory usage of query by simply adding memory 
estimate for all query fragments. Impala should able to estimate lower memory 
by analyzing this pipeline dependencies in query plan tree. Fragments that 
belongs to GETNEXT pipeline is less likely to consume all of its memory 
allotment until all OPEN pipelines that adjacent to that GETNEXT pipeline 
finish.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to