[ 
https://issues.apache.org/jira/browse/HBASE-26703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493502#comment-17493502
 ] 

Bryan Beaudreault commented on HBASE-26703:
-------------------------------------------

We have a very multi-tenant environment and we often have issues where one 
caller may start abusing HBase to the detriment of others. We've used the 
pluggable queue system to implement a prioritized queue. We send custom 
Priority levels in our RPCs, and each instance of PluggableBlockingQueue 
actually ends up backed by ~5 individual priority queues. The calls get added 
to the appropriate inner-queue based on their priority. When we're near 
saturation, we drop calls from the lower level queues in order to make room for 
higher level callers. On the flip side, when handlers pull from the queue, they 
always pull from the highest priority queues first. This gives a strong 
precedence to high priority callers (usually user-facing APIs).

This was all possible prior to this JIRA, but sets the stage for why we wanted 
a pluggable balancer as well... The above does great for insulating high 
priority callers from lower priority noisy neighbors.  But we also wanted some 
level of protection intra-queue, i.e. if there are 10 LOW priority callers, 
they'll all share a queue. If one of those callers started abusing HBase, the 
other callers in the same priority would be affected. This is where the 
balancer comes in. We've implemented a balancer which tries to insulate 
intra-priority callers from one another in two ways:
 # Rather than strictly random balancing, we use power of 2 choices (choose 2 
queues randomly, and then enqueue into the least-loaded one).
 # In addition, we've implemented a form of [shuffle 
sharding|https://d1.awsstatic.com/builderslibrary/pdfs/workload-isolation-using-shuffle-sharding.pdf]
 so that callers are very unlikely to fully share the exact same queues with 
one another. This way a bad caller can't fully starve any other caller.

Obviously this isn't perfect since at the end of the day everyone's still 
sharing the same underlying disk/cpu/etc, but it's helped a bit in our testing. 
We're still experimenting so didnt have something to actually upstream for 
everyone, but that'd be a goal if we come away with something generally 
applicable.

One might also ask why not just use quotas or other rate limiting, which is 
actually what we've had bespoke built into our hbase fork for years and are 
hoping to drop in our upgrade to hbase2. A big problem that has continually 
cropped up there is how much minor tweaking we've needed to do to these rate 
limits over time, because our environment is highly variable over time. This 
system with the custom queue and custom balancer should require very little 
tuning over time since it mostly just aims to enforce flexible fairness.

We're probably going to do a blog post with more details at some point in the 
future, and maybe upstream parts of it if there's interest.

> Allow configuration of IPC queue balancer
> -----------------------------------------
>
>                 Key: HBASE-26703
>                 URL: https://issues.apache.org/jira/browse/HBASE-26703
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Minor
>             Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> Currently we randomly assign IPC calls to queues using a RandomQueueBalancer, 
> which relies on ThreadLocalRandom. I would like to make that configurable so 
> that one can plug in their own queue balancer. This usefully combines with 
> the existing ability to specify a pluggable queue type.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to