[
https://issues.apache.org/jira/browse/HIVE-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551100#comment-16551100
]
Vihang Karajgaonkar commented on HIVE-20192:
--------------------------------------------
{quote} The PersistenceManagerFactory object "pmf" is a static object which
keeps references of the allocated PersistenceManager in pmCache Map. That's why
PersistenceManager doesn't get GC'ed and need explicit shutdown for any
exception. In this case we retry instead of closing the thread which overwrites
the pm object and leaks the old one. {quote}
I see. Thanks for the explanation.
{quote}I think, overwriting the entry by cacheThreadLocalRawStore doesn't cause
any leak, because, it overwrites with thread local rawStore which is active in
this thread. If the thread local rawStore is changed, it means, the older one
was already shutdown gracefully before re-create. Also, threadRawStoreMap
shouldn't pile up as we use the same thread id. {quote}
I think you are right. Looks like the model of cleaning up is optimistic in the
sense in case the thread is reused, {{Hive#getInternal}} method does some
checks to make sure if we can reuse this threadlocal rawstore and cleans it up
in case the owner is different or the config is not compatible. So looks like
we are good in case of thread re-use because the object which is being
overwritten in the {{ThreadWithGarbageCleanup.threadRawStoreMap}} is either
replaced with the same object or when the previous one was closed. So that code
path looks good to me. This is all very tricky business and I hope there is no
other code path which is still leaking the rawstore.
This patch looks good to me. +1
> HS2 with embedded metastore is leaking JDOPersistenceManager objects.
> ---------------------------------------------------------------------
>
> Key: HIVE-20192
> URL: https://issues.apache.org/jira/browse/HIVE-20192
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 3.0.0, 3.1.0, 4.0.0
> Reporter: Sankar Hariappan
> Assignee: Sankar Hariappan
> Priority: Major
> Labels: HiveServer2, pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20192.01.patch
>
>
> Hiveserver2 instances where crashing every 3-4 days and observed HS2 in on
> unresponsive state. Also, observed that the FGC collection happening regularly
> From JXray report it is seen that pmCache(List of JDOPersistenceManager
> objects) is occupying 84% of the heap and there are around 16,000 references
> of UDFClassLoader.
> {code:java}
> 10,759,230K (84.7%) Object tree for GC root(s) Java Static
> org.apache.hadoop.hive.metastore.ObjectStore.pmf
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.pmCache ↘ 10,744,419K
> (84.6%), 1 reference(s)
> - j.u.Collections$SetFromMap.m ↘ 10,744,419K (84.6%), 1 reference(s)
> - {java.util.concurrent.ConcurrentHashMap}.keys ↘ 10,743,764K (84.5%),
> 16,872 reference(s)
> - org.datanucleus.api.jdo.JDOPersistenceManager.ec ↘ 10,738,831K
> (84.5%), 16,872 reference(s)
> ... 3 more references together retaining 4,933K (< 0.1%)
> - java.util.concurrent.ConcurrentHashMap self 655K (< 0.1%), 1 object(s)
> ... 2 more references together retaining 48b (< 0.1%)
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.nucleusContext ↘
> 14,810K (0.1%), 1 reference(s)
> ... 3 more references together retaining 96b (< 0.1%){code}
> When the RawStore object is re-created, it is not allowed to be updated into
> the ThreadWithGarbageCleanup.threadRawStoreMap which leads to the new
> RawStore never gets cleaned-up when the thread exit.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)