[
https://issues.apache.org/jira/browse/HDFS-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647162#comment-14647162
]
Ming Ma commented on HDFS-8820:
-------------------------------
Thanks [~arpitagarwal]. Should we enable this for communication between DN and
NN? It appears RetriableException is only supported by
FailoverOnNetworkExceptionRetry used by client for NN HA scenario; DN doesn't
use that retry policy when it communicates with NN. In our clusters, we
configure service port on NN so DN RPCs go to the service RPC server and
backoff isn't enabled on that service RPC server. We can have DN use retry
policy that supports RetriableException; but that will require extra work.
For the configuration part, I wonder if we should use the pattern similar to
RPC's {{setProtocolEngine}},or {{ipc.server.read.threadpool.size}} where NN or
other services can call {{RPC.Builder#setnumReaders}} to override the value. In
that way, the NN doesn't need to know the format of the configuration key name.
> Enable RPC Congestion control by default
> ----------------------------------------
>
> Key: HDFS-8820
> URL: https://issues.apache.org/jira/browse/HDFS-8820
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Attachments: HDFS-8820.01.patch, HDFS-8820.02.patch
>
>
> We propose enabling RPC congestion control introduced by HADOOP-10597 by
> default.
> We enabled it on a couple of large clusters a few weeks ago and it has helped
> keep the namenodes responsive under load.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)