[
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277150#comment-14277150
]
Daryn Sharp commented on HDFS-7597:
-----------------------------------
Good discussion:
* The commons LRUMap is no-frills, lightweight, and specifically tailored to
this use case.
* Guava in my experience is developer friendly at the expense of performance
and excessive garbage. A quick glance at the cache builder code definitively
confirms this.
* A CHM is neither useful nor performant unless you intend to cache many
multiples of the number of accessing threads. Probably on the order of
thousands which is overkill.
If we happen to rarely make 2 UGIs instead of 1, that's remarkably better than
an unbounded N-many.
Let me take a look at the test failure...
> Clients seeking over webhdfs may crash the NN
> ---------------------------------------------
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: webhdfs
> Affects Versions: 2.0.0-alpha
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Priority: Critical
> Attachments: HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new
> open request with the new offset. The RPC layer caches connections so the DN
> keeps a lingering connection open to the NN. Connection caching is in part
> based on UGI. Although the client used the same token for the new offset
> request, the UGI is different which forces the DN to open another unnecessary
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)