[ 
https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17302:
------------------------------
    Attachment: HDFS-17302.003.patch

> RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-17302
>                 URL: https://issues.apache.org/jira/browse/HDFS-17302
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: rbf
>            Reporter: Jian Zhang
>            Assignee: Jian Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch, 
> HDFS-17302.003.patch
>
>
> h2. Current shortcomings
> [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a 
> StaticRouterRpcFairnessPolicyController to support configuring different 
> handlers for different ns. Using the StaticRouterRpcFairnessPolicyController 
> allows the router to isolate different ns, and the ns with a higher load will 
> not affect the router's access to the ns with a normal load. But the 
> StaticRouterRpcFairnessPolicyController still falls short in many ways, such 
> as:
> 1. *Configuration is inconvenient and error-prone*: When I use 
> StaticRouterRpcFairnessPolicyController, I first need to know how many 
> handlers the router has in total, then I have to know how many nameservices 
> the router currently has, and then carefully calculate how many handlers to 
> allocate to each ns so that the sum of handlers for all ns will not exceed 
> the total handlers of the router, and I also need to consider how many 
> handlers to allocate to each ns to achieve better performance. Therefore, I 
> need to be very careful when configuring. Even if I configure only one more 
> handler for a certain ns, the total number is more than the number of 
> handlers owned by the router, which will also cause the router to fail to 
> start. At this time, I had to investigate the reason why the router failed to 
> start. After finding the reason, I had to reconsider the number of handlers 
> for each ns. In addition, when I reconfigure the total number of handlers on 
> the router, I have to re-allocate handlers to each ns, which undoubtedly 
> increases the complexity of operation and maintenance.
> 2. *Extension ns is not supported*: During the running of the router, if a 
> new ns is added to the cluster and a mount is added for the ns, but because 
> no handler is allocated for the ns, the ns cannot be accessed through the 
> router. We must reconfigure the number of handlers and then refresh the 
> configuration. At this time, the router can access the ns normally. When we 
> reconfigure the number of handlers, we have to face disadvantage 1: 
> Configuration is inconvenient and error-prone.
> 3. *Waste handlers*:  The main purpose of proposing 
> RouterRpcFairnessPolicyController is to enable the router to access ns with 
> normal load and not be affected by ns with higher load. First of all, not all 
> ns have high loads; secondly, ns with high loads do not have high loads 24 
> hours a day. It may be that only certain time periods, such as 0 to 8 
> o'clock, have high loads, and other time periods have normal loads. Assume 
> there are 2 ns, and each ns is allocated half of the number of handlers. 
> Assume that ns1 has many requests from 0 to 14 o'clock, and almost no 
> requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, 
> and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 
> 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more 
> requests and the other ns has almost no requests, so we have wasted half of 
> the number of handlers.
> 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController 
> does not support sharing, only isolation. I think isolation is just a means 
> to improve the performance of router access to normal ns, not the purpose. It 
> is impossible for all ns in the cluster to have high loads. On the contrary, 
> in most scenarios, only a few ns in the cluster have high loads, and the 
> loads of most other ns are normal. For ns with higher load and ns with normal 
> load, we need to isolate their handlers so that the ns with higher load will 
> not affect the performance of ns with lower load. However, for nameservices 
> that are also under normal load, or are under higher load, we do not need to 
> isolate them, these ns of the same nature can share the handlers of the 
> router; The performance is better than assigning a fixed number of handlers 
> to each ns, because each ns can use all the handlers of the router.
> h2. New features
> Based on the above staticRouterRpcFairnessPolicyController, there are 
> deficiencies in usage and performance. I provide a new 
> RouterRpcFairnessPolicyController: 
> ProportionRouterRpcFairnessPolicyController (maybe with a better name) to 
> solve the above major shortcomings.
> 1. *More user-friendly configuration* : Supports allocating handlers 
> proportionally to each ns. For example, we can give ns1 a handler ratio of 
> 0.2, then ns1 will use 0.2 of the total number of handlers on the router. 
> Using this method, we do not need to confirm in advance how many handlers the 
> router has.
> 2. *Sharing and isolation* :  Sharing is as important as isolation. We 
> support that the sum of handlers for all ns exceeds the total number of 
> handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 
> 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 
> (0.5) handlers.This feature is very important,.Consider the following 
> scenarios:
> - Only one ns is busy during a period of time: Assume that ns1 has more 
> requests from 0 to 8 o'clock, ns2 has more requests from 8 to 16 o'clock, and 
> ns3 has more requests from 16 o'clock to 24 o'clock. Then, at any time 
> period, the ns with more requests uses at most half of the handlers, and the 
> other two normal ns share the remaining half of the handlers. In this way, 
> the isolation is still satisfied, and compared with 
> StaticRouterRpcFairnessPolicyController, we can use more handlers to handle 
> requests of busy and Normal ns (if you use 
> StaticRouterRpcFairnessPolicyController, each ns uses 3 handlers-[ns1:3 ns2:3 
> ns3:3], now we can let each ns use 5 handlers).
> - Only ns1 is busy: Assuming that only ns1 is busy at any time, the requests 
> for ns2 and ns3 are normal (the requests to access ns2 and ns3 are very few 
> and very fast because the downstream namenode has no pressure). We can give 
> ns1 5(0.5) handlers, ns2 and ns3 both have 10(1) handlers. Since the request 
> processing time of ns2 and ns3 is very short and the request volume is small, 
> it will not have a major impact on the performance of ns1, and we stipulate 
> that ns1 uses at most half of the handlers, so the isolation is still met.
> 3. *Transparent extension*: Expanding new ns does not require refreshing the 
> configuration. For an ns, if we do not assign handlers to it, we can assign a 
> certain proportion of handlers to it by default.
> 4. *Fully compatible*: The new RouterRpcFairnessPolicyController fully meets 
> the characteristics of StaticRouterRpcFairnessPolicyController. If we want to 
> only support isolation but not sharing, we can allocate 0.3 to ns2、0.3 to 
> ns3、0.4 to ns1. This is also more convenient than using the original 
> StaticRouterRpcFairnessPolicyController, because we don't need to know how 
> many handlers the router has in total.
> Therefore, the new RouterRpcFairnessPolicyController is more flexible, has 
> better performance, and is more suitable for actual production environments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to