[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944029#comment-16944029
 ] 

Erik Krogen commented on HDFS-14090:
------------------------------------

Hey [~crh], strong apologies for my 1-2 days to get a review turning into 2 
weeks. But here I am finally :) This is great work overall!
 
# {{RouterRpcClient#concurrentNS}} should be a {{static}} {{CONSTANT}}. Also it 
seems that this is redefined in {{StaticFairnessPolicyController}}, we should 
only define it in one place. Also duplicated in {{TestRouterFairnessManager}}.
# {{FairnessManager}} L55, you don't need {{throws IOException}} here
# Can you explain why we have {{FairnessManager}} instead of having 
{{RouterRpcClient}} directly access the {{FairnessPolicyController}}? It seems 
like it doesn't really add much benefit?
# I don't see the point in having {{FairnessPolicyController#acquirePermit()}} 
throw an exception to indicate failure, rather than returning a boolean, if 
{{FairnessManager}} is just going to catch it and convert it to a boolean 
anyway. This is on a hot path (if there are permit rejections being thrown then 
the Router must be busy) and creating exceptions is somewhat costly IIRC.
# In {{FairnessPolicyController}}, why not just call the initialization method 
{{init()}} or {{initialize()}} instead of {{assignHandlersToNameservices()}} ? 
But looking more closely, this method is only ever called in the constructor 
for {{StaticFairnessPolicyController}}. I think you should either refactor this 
so that an initialization method is called by 
{{FederationUtil#newFairnessPolicyResolver}} (and have a no-op constructor), or 
remove this method from the interface since it isn't used externally.
# In the Javadoc for {{NoFairnessPolicyController}}, you have an at-symbol like 
you intended to link something, is this a typo?
# {{FederationUtil#newFairnessPolicyResolver}} -> should this be called 
{{newFairnessPolicyController}} ?
# {{StaticFairnessPolicyController}} L78, I don't think we need the null-check 
here, as {{getAllConfiguredNS}} won't ever put null values into the set. I 
don't don't think you need the containment check on L83, since you're iterating 
over a set (no duplicates) to add items to the map.
# {{StaticFairnessPolicyController}} L113, you are calling {{toString()}} on a 
{{Set}} which doesn't have a good {{toString()}} implementation, you should use 
something else.
# {{StaticFairnessPolicyController}} L183, you are re-joining 
{{namenodeSplit}}, you can just use the original variable {{namenode}}
# Within {{TestRouterFairnessManager}}, {{createConf}} and {{generateBaseConf}} 
seem to be the same thing?
# Within {{TestRouterFairnessManager#verifyInstantiationError}}, would it be 
more idiomatic/simpler to use the {{GenericTestUtils.LogCapturer}}?
# {{TestRouterFairnessManager}} really seems like it's testing the logic of 
{{StaticFairnessPolicyController}}, not {{FairnessManager}}. Should we rename 
this test?
# For {{TestRouterHandlersFairness}}, I like the intent, but I'm worried this 
is going to become another flaky test in the build. Can you convince me that it 
won't be flaky?

> RBF: Improved isolation for downstream name nodes. {Static}
> -----------------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to