[
https://issues.apache.org/jira/browse/HDFS-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Íñigo Goiri updated HDFS-16646:
-------------------------------
Description:
### Description of PR
As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully
for RBF to minimize impact of clients connecting to healthy vs unhealthy
nameNodes.
But in prod environment, the traffic of clients accessing each NS and the
pressure of downstream namenodes are dynamically changed. So if we only have
one static permit conf, RBF cannot able to adapt to the changes in traffic to
achieve optimal results.
So here I propose an elastic RouterRpcFairnessPolicyController to help RBF
adapt to traffic changes to achieve an optimal result.
The overall idea is:
- Each name service can configured the exclusive permits like
`StaticRouterRpcFairnessPolicyController`
- TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits -
sum(NsExclusivePermit) as SharedPermits
- Each name service can properly preempt the SharedPermits after it's own
exclusive permits is used up.
- But the maximum value of SharedPermits preempted by each nameservice should
be limited. Such as 20% of SharedPermits.
Suppose we have 200 handlers and 5 name services, and each name services
configured different exclusive Permits, like:
| NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS |
|-- | -- | -- | -- | -- | -- |
| 9 | 11 | 8 | 12 | 10 | 50 |
The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = TotalPermits(200)
- Sum(NsExclusivePermit)(100) = 100`
Suppose we configure that each nameservice can preempt up to 20% of
TotalPermits, marked as `elasticPercent`.
Then from the point view of a single NS, the permits it may be can use are as
follow:
- Exclusive Permits, which is cannot be used by other name services.
- Limited SharedPermits, whether is can use so many shared permits depends on
the remaining number of SharedPermits, because the SharedPermits is be
preempted by all nameservices.
If we configure the `elasticPercent=100`, it means one nameservices can use up
all SharedPermits.
If we configure the `elasticPercent=0`, it means nameservice can only use it's
exclusive Permits.
If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5
unhealthy name services at the same time.
In our prod environment, we configured as follow, and it works well:
- RBF has 3000 handlers
- Each nameservice has 10 exclusive permits
- `elasticPercent` is 30%
Of course, we need to configure reasonable parameters according to the prod
traffic.
was:
As we all known, StaticRouterRpcFairnessPolicyController is very helpfully for
RBF to minimize impact of clients connecting to healthy vs unhealthy nameNodes.
But in prod environment, the traffic of clients accessing each NS and the
pressure of downstream namenodes are dynamically changed. So if we only have
one static permit conf, RBF cannot able to adapt to the changes in traffic to
achieve optimal results.
So here I propose an elastic RouterRpcFairnessPolicyController to help RBF
adapt to traffic changes to achieve an optimal result.
The overall idea is:
Each name service can configured the exclusive permits like
StaticRouterRpcFairnessPolicyController
TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits -
sum(NsExclusivePermit) as SharedPermits
Each name service can properly preempt the SharedPermits after it's own
exclusive permits is used up.
But the maximum value of SharedPermits preempted by each nameservice should be
limited. Such as 20% of SharedPermits.
Suppose we have 200 handlers and 5 name services, and each name services
configured different exclusive Permits, like:
NS1 NS2 NS3 NS4 NS5 Concurrent NS
9 11 8 12 10 50
The sum(NsExclusivePermit) is 100, and the SharedPermits = TotalPermits(200) -
Sum(NsExclusivePermit)(100) = 100
Suppose we configure that each nameservice can preempt up to 20% of
TotalPermits, marked as elasticPercent.
Then from the point view of a single NS, the permits it may be can use are as
follow:
Exclusive Permits, which is cannot be used by other name services.
Limited SharedPermits, whether is can use so many shared permits depends on the
remaining number of SharedPermits, because the SharedPermits is be preempted by
all nameservices.
If we configure the elasticPercent=100, it means one nameservices can use up
all SharedPermits.
If we configure the elasticPercent=0, it means nameservice can only use it's
exclusive Permits.
If we configure the elasticPercent=20, it means that the RBF can tolerate 5
unhealthy name services at the same time.
In our prod environment, we configured as follow, and it works well:
RBF has 3000 handlers
Each nameservice has 10 exclusive permits
elasticPercent is 30%
Of course, we need to configure reasonable parameters according to the prod
traffic.
> RBF: Support an elastic RouterRpcFairnessPolicyController
> ---------------------------------------------------------
>
> Key: HDFS-16646
> URL: https://issues.apache.org/jira/browse/HDFS-16646
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> ### Description of PR
> As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully
> for RBF to minimize impact of clients connecting to healthy vs unhealthy
> nameNodes.
> But in prod environment, the traffic of clients accessing each NS and the
> pressure of downstream namenodes are dynamically changed. So if we only have
> one static permit conf, RBF cannot able to adapt to the changes in traffic to
> achieve optimal results.
> So here I propose an elastic RouterRpcFairnessPolicyController to help RBF
> adapt to traffic changes to achieve an optimal result.
> The overall idea is:
> - Each name service can configured the exclusive permits like
> `StaticRouterRpcFairnessPolicyController`
> - TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits -
> sum(NsExclusivePermit) as SharedPermits
> - Each name service can properly preempt the SharedPermits after it's own
> exclusive permits is used up.
> - But the maximum value of SharedPermits preempted by each nameservice should
> be limited. Such as 20% of SharedPermits.
> Suppose we have 200 handlers and 5 name services, and each name services
> configured different exclusive Permits, like:
> | NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS |
> |-- | -- | -- | -- | -- | -- |
> | 9 | 11 | 8 | 12 | 10 | 50 |
> The `sum(NsExclusivePermit)` is 100, and the `SharedPermits =
> TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100`
> Suppose we configure that each nameservice can preempt up to 20% of
> TotalPermits, marked as `elasticPercent`.
> Then from the point view of a single NS, the permits it may be can use are as
> follow:
> - Exclusive Permits, which is cannot be used by other name services.
> - Limited SharedPermits, whether is can use so many shared permits depends on
> the remaining number of SharedPermits, because the SharedPermits is be
> preempted by all nameservices.
> If we configure the `elasticPercent=100`, it means one nameservices can use
> up all SharedPermits.
> If we configure the `elasticPercent=0`, it means nameservice can only use
> it's exclusive Permits.
> If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5
> unhealthy name services at the same time.
> In our prod environment, we configured as follow, and it works well:
> - RBF has 3000 handlers
> - Each nameservice has 10 exclusive permits
> - `elasticPercent` is 30%
> Of course, we need to configure reasonable parameters according to the prod
> traffic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]