Jinglun created HADOOP-16403:
--------------------------------

             Summary: Start a new statistical rpc queue and make the Reader's 
pendingConnection queue runtime-replaceable
                 Key: HADOOP-16403
                 URL: https://issues.apache.org/jira/browse/HADOOP-16403
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Jinglun


I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so 
after the active dead, it takes the standby more than 40s to become active. 
Many requests(tcp connect request and rpc request) from Datanodes, clients and 
zkfc timed out and start retrying. The suddenly request flood lasts for the 
next 2 minutes and finally all requests are either handled or run out of retry 
times. 
Adjusting the rpc related settings might power the NameNode and solve this 
problem and the key point is finding the bottle neck. The rpc server can be 
described as below:
{noformat}
Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
By sampling some failed clients, I find many of them got ConnectException. It's 
caused by a 20s un-responded tcp connect request. I think may be the reader 
queue is full and block the listener from handling new connections. Both slow 
handlers and slow readers can block the whole processing progress, and I need 
to know who it is. I think *a queue that computes the qps, write log when the 
queue is full and could be replaced easily* will help. 
I find the nice work HADOOP-10302 implementing a runtime-swapped queue. Using 
it at Reader's queue makes the reader queue runtime-swapped automatically. The 
qps computing job could be done by implementing a subclass of LinkedBlockQueue 
that does the computing job while put/take/... happens. The qps data will show 
on jmx.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to