[ 
https://issues.apache.org/jira/browse/HDFS-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036660#comment-18036660
 ] 

ASF GitHub Bot commented on HDFS-16646:
---------------------------------------

github-actions[bot] closed pull request #4519: HDFS-16646. RBF: Support an 
elastic RouterRpcFairnessPolicyController
URL: https://github.com/apache/hadoop/pull/4519




> RBF: Support an elastic RouterRpcFairnessPolicyController
> ---------------------------------------------------------
>
>                 Key: HDFS-16646
>                 URL: https://issues.apache.org/jira/browse/HDFS-16646
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully 
> for RBF to minimize impact of clients connecting to healthy vs unhealthy 
> nameNodes. 
> But in prod environment, the traffic of clients accessing each NS and the 
> pressure of downstream namenodes are dynamically changed. So if we only have 
> one static permit conf, RBF cannot able to adapt to the changes in traffic to 
> achieve optimal results. 
> So here I propose an elastic RouterRpcFairnessPolicyController to help RBF 
> adapt to traffic changes to achieve an optimal result.
> The overall idea is:
> * Each name service can configured the exclusive permits like 
> `StaticRouterRpcFairnessPolicyController`
> * TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - 
> sum(NsExclusivePermit) as SharedPermits
> * Each name service can properly preempt the SharedPermits after it's own 
> exclusive permits is used up.
> * But the maximum value of SharedPermits preempted by each nameservice should 
> be limited. Such as 20% of SharedPermits.
> Suppose we have 200 handlers and 5 name services, and each name services 
> configured different exclusive Permits, like:
> | NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS |
> |-- | -- | -- | -- | -- | -- |
> | 9 | 11 | 8 | 12 | 10 | 50 |
> The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = 
> TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100`
> Suppose we configure that each nameservice can preempt up to 20% of 
> TotalPermits, marked as `elasticPercent`.
> Then from the point view of a single NS, the permits it may be can use are as 
> follow:
> - Exclusive Permits, which is cannot be used by other name services.
> - Limited SharedPermits, whether is can use so many shared permits depends on 
> the remaining number of SharedPermits, because the SharedPermits is be 
> preempted by all nameservices.
> If we configure the `elasticPercent=100`, it means one nameservices can use 
> up all SharedPermits.
> If we configure the `elasticPercent=0`, it means nameservice can only use 
> it's exclusive Permits.
> If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5 
> unhealthy name services at the same time.
> In our prod environment, we configured as follow, and it works well:
> - RBF has 3000 handlers
> - Each nameservice has 10 exclusive permits
> - `elasticPercent` is 30%
> Of course, we need to configure reasonable parameters according to the prod 
> traffic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to