[
https://issues.apache.org/jira/browse/HADOOP-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884346#comment-16884346
]
Hadoop QA commented on HADOOP-16403:
------------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color}
| {color:red} HADOOP-16403 does not apply to trunk. Rebase required? Wrong
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-16403 |
| Console output |
https://builds.apache.org/job/PreCommit-HADOOP-Build/16380/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Start a new statistical rpc queue and make the Reader's pendingConnection
> queue runtime-replaceable
> ---------------------------------------------------------------------------------------------------
>
> Key: HADOOP-16403
> URL: https://issues.apache.org/jira/browse/HADOOP-16403
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Jinglun
> Priority: Major
> Attachments: HADOOP-16403-How_MetricLinkedBlockingQueue_Works.pdf,
> HADOOP-16403.001.patch, MetricLinkedBlockingQueueTest.pdf
>
>
> I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so
> after the active dead, it takes the standby more than 40s to become active.
> Many requests(tcp connect request and rpc request) from Datanodes, clients
> and zkfc timed out and start retrying. The suddenly request flood lasts for
> the next 2 minutes and finally all requests are either handled or run out of
> retry times.
> Adjusting the rpc related settings might power the NameNode and solve this
> problem and the key point is finding the bottle neck. The rpc server can be
> described as below:
> {noformat}
> Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
> By sampling some failed clients, I find many of them got
> ConnectTimeoutException. It's caused by a 20s un-responded tcp connect
> request. I think may be the reader queue is full and block the listener from
> handling new connections. Both slow handlers and slow readers can block the
> whole processing progress, and I need to know who it is. I think *a queue
> that computes the qps, write log when the queue is full and could be replaced
> easily* will help.
> I find the nice work HADOOP-10302 implementing a runtime-swapped queue.
> Using it at Reader's queue makes the reader queue runtime-swapped
> automatically. The qps computing job could be done by implementing a subclass
> of LinkedBlockQueue that does the computing job while put/take/... happens.
> The qps data will show on jmx.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]