liuyao created IMPALA-10377:
-------------------------------
Summary: Improve the accuracy of resource estimation
Key: IMPALA-10377
URL: https://issues.apache.org/jira/browse/IMPALA-10377
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: liuyao
Fix For: Impala 4.0
PlanNode does not consider some factors when estimating memory, this will cause
a large error rate
AggregationNode
1.The memory occupied by hash table's own data structure is not considered.
Hash table inserts a new value, which will add a bucket. The size of a bucket
is 16 bytes.
2.When estimating the NDV of merge aggregation, if there are multiple grouping
exprs, it may be divided by the number of Fragment Instances several times, and
it should be divided only once.
3.When estimating the NDV of merge aggregation, and there are multiple grouping
exprs, the estimated memory is much smaller than the actual use.
4.If there is no grouping exprs, the estimated memory is much larger than the
actual use.
5.If the NDV of grouping exprs is very small, the estimated memory is much
larger than the actual use.
SortNode
1.Estimate the memory usage of external sort. the estimated memory is much
smaller than the actual use.
HashJoinNode
1.The memory occupied by hash table's own data structure is not considered.Hash
Table will keep duplicate data, so the size of DuplicateNode should be
considered.
2.Hash table will create multiple buckets in advance. The size of these buckets
should be considered.
KuduScanNode
1.Estimate memory by scanning all columns,the estimated memory is much larger
than the actual use.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)