[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898235#comment-16898235
 ] 

Íñigo Goiri commented on HDFS-14090:
------------------------------------

Thanks [~crh] for the update.
I had the chance to go a little deeper and I have a concern about the "Permit" 
wording.
It is "followable" but I'm not sure is the most common terminology.
{{PermitAllocationException}} is like a setup exception so something closer to 
{{IllegalArgumentException}}.
{{NoPermitAvailableException}} is something like too many requests, not sure 
what the best mapping here would be.
Eventually it surfaces as the Router is overloaded which is good, however for 
code readability we may want to use a more intuitive concept throughout the 
code.
[~xkrogen], you have been working on the RPC fairness, do you have any 
suggestion for the terminology?

Other minor comments:
* logFinalAssignment() is a one-liner that could just be there directly. 
logAssignment() at least is used a few more times.
* {{LOG.info("Final permit allocation table {}", this.permits.toString());}} no 
need for toString().
* Instead of having two configs, one for enabling and one for the 
implementation, we could have just the implementation and by default provide a 
dummy implementation that doesn't do fairness. Then we would rename the current 
DefaultFairnessPolicyController to something more descriptive (to reflect equal 
or linear or similar).
* Coming back to {{PermitAllocationException}}, right now we are kind of 
logging and swallowing; what about failing the whole startup?
* {{TestRouterFairnessManager#160}} could just use a for loop.

> RBF: Improved isolation for downstream name nodes.
> --------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to