[
https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-10377:
------------------------------------
Fix Version/s: Impala 4.0.0
> Improve the accuracy of resource estimation
> -------------------------------------------
>
> Key: IMPALA-10377
> URL: https://issues.apache.org/jira/browse/IMPALA-10377
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: liuyao
> Assignee: liuyao
> Priority: Major
> Labels: estimate, memory, statistics
> Fix For: Impala 4.0.0
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> PlanNode does not consider some factors when estimating memory, this will
> cause a large error rate
>
> AggregationNode
>
> 1.The memory occupied by hash table's own data structure is not considered.
> Hash table inserts a new value, which will add a bucket. The size of a bucket
> is 16 bytes.
> 2.When estimating the NDV of merge aggregation, if there are multiple
> grouping exprs, it may be divided by the number of Fragment Instances several
> times, and it should be divided only once.
> 3.When estimating the NDV of merge aggregation, and there are multiple
> grouping exprs, the estimated memory is much smaller than the actual use.
> 4.If there is no grouping exprs, the estimated memory is much larger than the
> actual use.
> 5.If the NDV of grouping exprs is very small, the estimated memory is much
> larger than the actual use.
>
> SortNode
> 1.Estimate the memory usage of external sort. the estimated memory is much
> smaller than the actual use.
>
>
> HashJoinNode
> 1.The memory occupied by hash table's own data structure is not
> considered.Hash Table will keep duplicate data, so the size of DuplicateNode
> should be considered.
> 2.Hash table will create multiple buckets in advance. The size of these
> buckets should be considered.
>
> KuduScanNode
> 1.Estimate memory by scanning all columns,the estimated memory is much larger
> than the actual use.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]