[ 
https://issues.apache.org/jira/browse/HIVE-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288935#comment-15288935
 ] 

Jürgen Thomann commented on HIVE-13531:
---------------------------------------

I investigated the problem now a bit more after the second heap dump and this 
problem can be reproduced if this UDF is used at the same time in multiple 
queries.

I'm not sure which is the best version to solve this problem, but there are at 
least 2 possible fixes.
1. Change the HashCache to a synchronized Map which is easily done with 
Collections.synchronizedMap
2. remove the static from the declaration of jsonObjectCache. I not sure why it 
is static, but if two different queries uses json_tuple they would use the same 
cache at the moment which would reduce the effective cache size for each query.

Another thing is the use of INIT_SIZE = 32 and CACHE_SIZE = 16 with a load 
factor of 0.6f. Wouldn't it make more sense to increase the load factor to 
nearly one and increase the CACHE_SIZE to 28 or something in that area?

> Cache in json_tuple UDF grows larger than it should
> ---------------------------------------------------
>
>                 Key: HIVE-13531
>                 URL: https://issues.apache.org/jira/browse/HIVE-13531
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 1.1.0
>         Environment: CDH 5.5.0 with Java 1.8.0_45
>            Reporter: Jürgen Thomann
>            Assignee: Jason Dere
>            Priority: Minor
>
> According to the code in 
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java 
> the HashCache should never grow larger than 16 entries. In the last OOM of 
> Hive Server 2 I found this HashCache with over 1 million 
> java.util.LinkedHashMap$Entry objects.
> The code looks right and works single threaded as it should when I tested it 
> isolated. The only problem I can imagine with my limited Hive source code 
> knowledge that it is accessed concurrently and somewhere the cleanup with 
> removeEldestEntry is not working in that case.
> I had this problem with Hive 1.1.0 but the current implementation in master 
> looks the same for the HashCache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to