[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860405#comment-16860405
 ] 

Íñigo Goiri commented on HDFS-14090:
------------------------------------

Thanks [~crh] for  [^HDFS-14090-HDFS-13891.003.patch].
* I think we should use separate unit tests for FairnessManager instead of only 
relying on the full Router (which is also good). Ideally covering all the error 
cases.
* For the log in grantPermission() we may want to start with an explanation 
instead of just the full exception.
* Avoid using toString() in the logger output.
* There are a bunch of check styles which I guess Yetus will output for 
example, {{FairnessManager#54}} has a weird indentation.
* I think we can use {{ReflectionUtils}} instead of managing the constructor 
and so on in {{FairnessPolicyController}}.
* Should we assign fairnessPolicyController to null when shutting down?
* It would be nice to add javadocs to the functions in FairnessPolicyController 
describing uses of them.
* DefaultFairnessPolicyController#assignHandlersToNameservices can define the 
errorMsg right away and output it without concatenating afterwards.
* I prefer to use {{isEmpty()}} instead of {{size() > 0}}.
* Should we use some special key instead of null for {{unassignedNS}}? Then use 
this constant.
* Can we add more details (e.g., operations) when doing {{acquirePermit()}} 
this is very similar to the locks in the NN and [~xkrogen] added a bunch of 
details for debugging.
* Would it be cleaner to use {{LambdaTestUtils#intercept}} instead of 
{{exceptionRule}} in these cases?
* Not sure it makes sense to fail twice in {{TestRouterHandlersFairness 
#161,162}}.
* Use lambdas in {{TestRouterHandlersFairness #136}}.

> RBF: Improved isolation for downstream name nodes.
> --------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to