[
https://issues.apache.org/jira/browse/HIVE-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JinsuKim updated HIVE-13665:
----------------------------
Attachment: patch.lst.txt
> HS2 memory leak When multiple queries are running with get_json_object
> ----------------------------------------------------------------------
>
> Key: HIVE-13665
> URL: https://issues.apache.org/jira/browse/HIVE-13665
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.0
> Reporter: JinsuKim
> Attachments: patch.lst.txt
>
>
> The extractObjectCache in UDFJson is increased over limitation(CACHE_SIZE =
> 16). When multiple queries are running concurrently on HS2 local(not mr/tez)
> with get_json_object or get_json_tuple
> {code:java|title=HS2 heap_dump}
> Object at 0x515ab18f8
> instance of org.apache.hadoop.hive.ql.udf.UDFJson$HashCache@0x515ab18f8 (77
> bytes)
> Class:
> class org.apache.hadoop.hive.ql.udf.UDFJson$HashCache
> Instance data members:
> accessOrder (Z) : false
> entrySet (L) : <null>
> hashSeed (I) : 0
> header (L) : java.util.LinkedHashMap$Entry@0x515a577d0 (60 bytes)
> keySet (L) : <null>
> loadFactor (F) : 0.6
> modCount (I) : 4741146
> size (I) : 2733158 <========== here!!
> table (L) : [Ljava.util.HashMap$Entry;@0x7163d8b70 (67108880 bytes)
> threshold (I) : 5033165
> values (L) : <null>
> References to this object:
> {code}
> I think that this problem be caused by the LinkedHashMap object is not
> thread-safe
> {code}
> * <p><strong>Note that this implementation is not synchronized.</strong>
> * If multiple threads access a linked hash map concurrently, and at least
> * one of the threads modifies the map structurally, it <em>must</em> be
> * synchronized externally. This is typically accomplished by
> * synchronizing on some object that naturally encapsulates the map.
> {code}
> Reproduce :
> # Multiple queries are running with get_json_object and small input data(for
> execution on hs2 local mode)
> # jvm heap dump & analyze
> {code:title=test scenario}
> Multiple queries are running with get_json_object and small input data(for
> execute on hs2 local mode)
> 1.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body,
> '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM
> xxx.tttt WHERE part_hour='2016040105'
> 2.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body,
> '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM
> xxx.tttt WHERE part_hour='2016040106'
> 3.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body,
> '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM
> xxx.tttt WHERE part_hour='2016040107'
> 4.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body,
> '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM
> xxx.tttt WHERE part_hour='2016040108'
>
> run.sh :
> t_cnt=0
> while true
> do
> echo "query executing..."
> for i in 1 2 3 4
> do
> beeline -u jdbc:hive2://localhost:10000 -n hive --silent=true -f
> $i.hql > $i.log 2>&1 &
> done
> wait
> t_cnt=`expr $t_cnt + 1`
> echo "query count : $t_cnt"
> sleep 2
> done
> jvm heap dump & analyze :
> jmap -dump:format=b,file=hive.dmp $PID
> jhat -J-mx48000m -port 8080 hive.dmp &
> {code}
> Finally I have attached our patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)