[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Daryn Sharp (JIRA) Fri, 29 Jun 2012 08:29:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403970#comment-13403970
 ]


Daryn Sharp commented on HIVE-3098:
-----------------------------------

You can't reasonably expect or know that any code called, now or in the future, 
by the request isn't going to implicitly alter the {{Subject}} with the 
addition of tokens/TGT/service tickets/etc.

bq. Your solution of closeAllForUGI means you have to keep the original UGI, if 
you keep recreating them you are back to square one.

The UGI is basically a wrapper around the javax security context's {{Subject}} 
which is created by the authentication process.  By design, the UGI/Subject is 
a context for a specific request/connection -- not for an indeterminate number 
of future requests.  A UGI should be created when a request comes in and used 
throughout the request rather than a new UGI created multiple times during the 
request.  If the request operates in a {{doAs}} block, you don't have to 
explicitly remember the UGI.  You can simply do a 
{{FileSystem.closeAllForUGI(UserGroupInformation.currentUser()}} when the 
request is done.

bq. I'd argue that we could achieve UGI immutability if we create a new UGI 
everytime you add credentials to it by composing the old UGI and the new 
credentials.

I admire the thought, but it won't work.  If you mean creating a new UGI:
* Login/relogin is completely broken.  It's not possible to propagate the new 
UGI to everything holding the old UGI reference in every thread.
* Code operating in a {{doAs}} block would have to add another nested {{doAs}} 
to use the tokens/tickets it just acquired.
* Similarly, code currently expects to create a UGI, perform a {{doAs}} which 
may update the {{Subject}} with tickets or tokens, exit the {{doAs}} block, 
then call {{doAs}} on the same UGI and expect the tokens to be in the UGI.

If you mean swapping out the UGI's subject, it won't work either.  The hash 
code is based on the {{Subject}}'s identity hash.  Swapping out the {{Subject}} 
when tokens are added fouls collections using the UGI.

There might be some interesting things that could be done by moving the fs 
cache into the {{Subject}}'s credentials.  A finalizer could ensure that all 
{{Closable}} objects (such as fs instances) in the {{Subject}}'s credentials 
are closed.  That still requires a UGI to be scoped to a request but could 
better ensure that everything is cleaned up for the security context.
                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
> (Must cache UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
> turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
> the Oracle backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
> in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
> 1000000 instances of FileSystem, whose combined retained-mem consumed the 
> entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented 
> such that the "Subject" member is compared for equality ("=="), and not 
> equivalence (".equals()"). This causes equivalent UGI instances to compare as 
> unequal, and causes a new FileSystem instance to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another 
> problem (HADOOP-6670); so it is unlikely that that implementation can be 
> modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
> the Hive metastore), using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
> test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Reply via email to