[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146107#comment-16146107
 ] 

Prasanth Jayachandran commented on HIVE-17304:
----------------------------------------------

The config changed because we are often very close to estimates in most of the 
cases (vectorized atleast). I have seen some heapdumps with 2GB hash tables and 
estimates from log lines are also very close to 2GB (<5%). Initial 2x factor 
was added earlier primarily for non-vectorized cases + object overhead + 
key/value size misestimation. 
Also 2x factor is after memory overscription which already gives some more room 
for hash tables. With this patch even in non-vectorized case we are pretty 
close when ThreadMXBean info is used. The idea is to get close to noconditional 
task size + oversubscribed memory. So relaxed it to 1.5x :)


> ThreadMXBean based memory allocation monitory for hash table loader
> -------------------------------------------------------------------
>
>                 Key: HIVE-17304
>                 URL: https://issues.apache.org/jira/browse/HIVE-17304
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to