[
https://issues.apache.org/jira/browse/HADOOP-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777870#comment-13777870
]
Daryn Sharp commented on HADOOP-9955:
-------------------------------------
Good points, Suresh. The {{Server}} class overall isn't trivial to understand
with all the nested classes nestled between the methods of the outer class so
it'll be an improvement.
BTW, I stumbled upon another bug: connection closing isn't +extremely+
inefficient... it's +completely+ inefficient. The rpc count used to determine
idle (count=0) goes negative with security enabled. SASL responses decrement
but don't increment the rpc count.
*No connections are ever actually closed. The NN just slows itself down under
load!* It's been like this since SASL was added.
BTW, I'm currently performance testing the patch.
> RPC idle connection closing is extremely inefficient
> ----------------------------------------------------
>
> Key: HADOOP-9955
> URL: https://issues.apache.org/jira/browse/HADOOP-9955
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: ipc
> Affects Versions: 2.0.0-alpha, 3.0.0
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Attachments: HADOOP-9955.patch
>
>
> The RPC server listener loops accepting connections, distributing the new
> connections to socket readers, and then conditionally & periodically performs
> a scan for idle connections. The idle scan choses a _random index range_ to
> scan in a _synchronized linked list_.
> With 20k+ connections, walking the range of indices in the linked list is
> extremely expensive. During the sweep, other threads (socket responder and
> readers) that want to close connections are blocked, and no new connections
> are being accepted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira