CR Hota created HDFS-14090:
------------------------------
Summary: RBF: Improved isolation for downstream name nodes.
Key: HDFS-14090
URL: https://issues.apache.org/jira/browse/HDFS-14090
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: CR Hota
Assignee: CR Hota
Router is a gateway to underlying name nodes. Gateway architectures, should
help minimize impact of clients connecting to healthy clusters vs unhealthy
clusters.
For example - If there are 2 name nodes downstream, and one of them is heavily
loaded with calls spiking rpc queue times, due to back pressure the same with
start reflecting on the router. As a result of this, clients connecting to
healthy/faster name nodes will also slow down as same rpc queue is maintained
for all calls at the router layer. Essentially the same IPC thread pool is used
by router to connect to all name nodes.
Currently router uses one single rpc queue for all calls. Lets discuss how we
can change the architecture and add some throttling logic for
unhealthy/slow/overloaded name nodes.
One way could be to read from current call queue, immediately identify
downstream name node and maintain a separate queue for each underlying name
node. Another simpler way is to maintain some sort of rate limiter configured
for each name node and let routers drop/reject/send error requests after
certain threshold.
This won’t be a simple change as router’s ‘Server’ layer would need redesign
and implementation. Currently this layer is the same as name node.
Opening this ticket to discuss, design and implement this feature.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]