[ 
https://issues.apache.org/jira/browse/HADOOP-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897945#comment-16897945
 ] 

Jinglun commented on HADOOP-16403:
----------------------------------

About shadedclient error, I searched 
[patch-shadedclient.txt|https://builds.apache.org/job/PreCommit-HADOOP-Build/16437/artifact/out/patch-shadedclient.txt]
 and found this:
{quote}[ERROR] Found artifact with unexpected contents: 
'/testptch/hadoop/hadoop-client-modules/hadoop-client-api/target/hadoop-client-api-3.3.0-SNAPSHOT.jar'
 Please check the following and either correct the build or update
 the allowed list with reasoning.

core-default.xml.orig
{quote}
There is a jar check in 
*_./hadoop-client-modules/hadoop-client-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh_*,
 seems core-default.xml.orig is packaged into 
hadoop-client-api-3.3.0-SNAPSHOT.jar. 

I'm not sure how does this happen. I make a new patch from the latest trunk and 
fix the check styles. Upload patch-005 see if the shadedclient error still 
occurs.

 

 

> Start a new statistical rpc queue and make the Reader's pendingConnection 
> queue runtime-replaceable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16403
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HADOOP-16403-How_MetricLinkedBlockingQueue_Works.pdf, 
> HADOOP-16403.001.patch, HADOOP-16403.002.patch, HADOOP-16403.003.patch, 
> HADOOP-16403.004.patch, MetricLinkedBlockingQueueTest.pdf
>
>
> I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so 
> after the active dead, it takes the standby more than 40s to become active. 
> Many requests(tcp connect request and rpc request) from Datanodes, clients 
> and zkfc timed out and start retrying. The suddenly request flood lasts for 
> the next 2 minutes and finally all requests are either handled or run out of 
> retry times. 
>  Adjusting the rpc related settings might power the NameNode and solve this 
> problem and the key point is finding the bottle neck. The rpc server can be 
> described as below:
> {noformat}
> Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
> By sampling some failed clients, I find many of them got 
> ConnectTimeoutException. It's caused by a 20s un-responded tcp connect 
> request. I think may be the reader queue is full and block the listener from 
> handling new connections. Both slow handlers and slow readers can block the 
> whole processing progress, and I need to know who it is. I think *a queue 
> that computes the qps, write log when the queue is full and could be replaced 
> easily* will help. 
>  I find the nice work HADOOP-10302 implementing a runtime-swapped queue. 
> Using it at Reader's queue makes the reader queue runtime-swapped 
> automatically. The qps computing job could be done by implementing a subclass 
> of LinkedBlockQueue that does the computing job while put/take/... happens. 
> The qps data will show on jmx.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to