[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

Owen O'Malley (JIRA) Thu, 17 Sep 2015 14:31:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804547#comment-14804547
 ]


Owen O'Malley commented on HDFS-8855:
-------------------------------------

A few points:
* You need to use the Token.getKind(), Token.getIdentifier(), and 
Token.getPassword() as the key for the cache. The patch currently uses 
Token.toString, which uses the identifier, kind, and service. The service is 
set by the client so it shouldn't be part of the match. The password on the 
other hand must be part of the match so that guessing the identifier doesn't 
allow a hacker to impersonate the user.
* The timeout should default to 10 minutes instead of 10 seconds.
* Please fix the checkstyle and findbugs warnings.
* Determine what is wrong with the test case.

Other than that, it looks good.

> Webhdfs client leaks active NameNode connections
> ------------------------------------------------
>
>                 Key: HDFS-8855
>                 URL: https://issues.apache.org/jira/browse/HDFS-8855
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Bob Hansen
>            Assignee: Xiaobing Zhou
>         Attachments: HDFS-8855.1.patch, HDFS-8855.2.patch, HDFS-8855.3.patch, 
> HDFS-8855.4.patch, HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

Reply via email to