[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Rohini Palaniswamy (JIRA) Thu, 21 Jun 2012 16:09:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398979#comment-13398979
 ]


Rohini Palaniswamy commented on HIVE-3098:
------------------------------------------

@Daryn
{code}
 public static UserGroupInformation createProxyUser(String user,
      UserGroupInformation realUser) {
    Set<Principal> principals = subject.getPrincipals();
    principals.add(new User(user));
    principals.add(new RealUser(realUser));
    UserGroupInformation result =new UserGroupInformation(subject);
{code}

  Since the new UGI refers to the realUser (which is 
UserGroupInformation.getLoginUser()), the tokens used for authentication would 
be that of the logged in user. Since TGT will be part of the logged in users 
tokens, it will always be used to do SASL authentication. Currently the 
filesystem fetched inside doAs in hive metastore is only hdfs:// and so only 
SASL will be done. 

The problem you mentioned with hftp:// and webhdfs:// is there. UGI does not 
get modified with addition of tokens, but the FileSystem class itself becomes 
unusable because they fetch the delegation tokens  in their constructors and 
store in private variables of the class. DelegationTokenRenewer in those 
classes take care of renewing the token. But after the renewal limit is over(7 
days default), the local variable is not updated and so the FileSystem class 
becomes unusable. One way to get it working would be to fix the 
DelegationTokenRenewer to get a new token and update the local variable after 
token expiry.

@Ashutosh,
   We need to add closing of the filesystem along with the cache if we are 
going to access hftp:// or webhdfs:// in hive metastore. hftp:// I am not sure 
as it readonly, but users might start specify partition location using 
webhdfs:// soon.


                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
> (Must cache UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
> turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
> the Oracle backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
> in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
> 1000000 instances of FileSystem, whose combined retained-mem consumed the 
> entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented 
> such that the "Subject" member is compared for equality ("=="), and not 
> equivalence (".equals()"). This causes equivalent UGI instances to compare as 
> unequal, and causes a new FileSystem instance to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another 
> problem (HADOOP-6670); so it is unlikely that that implementation can be 
> modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
> the Hive metastore), using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
> test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Reply via email to