[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

Brahma Reddy Battula (Jira) Tue, 01 Oct 2019 15:41:17 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942355#comment-16942355
 ]


Brahma Reddy Battula commented on HDFS-14090:
---------------------------------------------

[~crh] thanks for great work here. I too liked first apporach .Sorry for late 
reply.

Overall apporach looks good apart from the following minor suggestions if you 
agree.

 

i) Following might mislead,May be we can log number of handlers are overloaded 
as we through same message. 
{code:java}
LOG.debug("Permission denied for ugi: {} for method: {}",
 ugi, m.getName()); 
{code}
ii) Following will give fairness instead of *tryAcquire()*
{code:java}
public boolean tryAcquire(long timeout, TimeUnit unit){code}
iiI) As this demands Total number of handlers configured for all the 
nameserivce should be less than or equal to totalhandlers of RBF,may be these 
we need to document in HDFS-14558.

iv) looks Naming of method and classes might improve..? E.G intead of  
"FairnessManager.java" like RBFRpcFairnessManager or 
RBFHandlerFairnessManger.java....acquirepermit(..)->acquireHandler() (Looks 
permits you got from semaphore)....any thoughts.?

v) can we expose number of handlers available or used handlers for NS level?

would like to see how the dynamic  allocation (HDFS-14750) and observer load 
will be distributed (As static might not be more benefit since cluster load 
will not predicatable)
  

> RBF: Improved isolation for downstream name nodes. {Static}
> -----------------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

Reply via email to