[
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146107#comment-16146107
]
Prasanth Jayachandran commented on HIVE-17304:
----------------------------------------------
The config changed because we are often very close to estimates in most of the
cases (vectorized atleast). I have seen some heapdumps with 2GB hash tables and
estimates from log lines are also very close to 2GB (<5%). Initial 2x factor
was added earlier primarily for non-vectorized cases + object overhead +
key/value size misestimation.
Also 2x factor is after memory overscription which already gives some more room
for hash tables. With this patch even in non-vectorized case we are pretty
close when ThreadMXBean info is used. The idea is to get close to noconditional
task size + oversubscribed memory. So relaxed it to 1.5x :)
> ThreadMXBean based memory allocation monitory for hash table loader
> -------------------------------------------------------------------
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be
> unreliable because of various reasons (wrong object size estimation, adding
> new variables to any class without accounting its size for memory monitoring,
> etc.). We can use allocation size per thread that is provided by ThreadMXBean
> and fallback to DataModel in case if JDK doesn't support thread based
> allocations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)