[
https://issues.apache.org/jira/browse/HBASE-26703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493502#comment-17493502
]
Bryan Beaudreault commented on HBASE-26703:
-------------------------------------------
We have a very multi-tenant environment and we often have issues where one
caller may start abusing HBase to the detriment of others. We've used the
pluggable queue system to implement a prioritized queue. We send custom
Priority levels in our RPCs, and each instance of PluggableBlockingQueue
actually ends up backed by ~5 individual priority queues. The calls get added
to the appropriate inner-queue based on their priority. When we're near
saturation, we drop calls from the lower level queues in order to make room for
higher level callers. On the flip side, when handlers pull from the queue, they
always pull from the highest priority queues first. This gives a strong
precedence to high priority callers (usually user-facing APIs).
This was all possible prior to this JIRA, but sets the stage for why we wanted
a pluggable balancer as well... The above does great for insulating high
priority callers from lower priority noisy neighbors. But we also wanted some
level of protection intra-queue, i.e. if there are 10 LOW priority callers,
they'll all share a queue. If one of those callers started abusing HBase, the
other callers in the same priority would be affected. This is where the
balancer comes in. We've implemented a balancer which tries to insulate
intra-priority callers from one another in two ways:
# Rather than strictly random balancing, we use power of 2 choices (choose 2
queues randomly, and then enqueue into the least-loaded one).
# In addition, we've implemented a form of [shuffle
sharding|https://d1.awsstatic.com/builderslibrary/pdfs/workload-isolation-using-shuffle-sharding.pdf]
so that callers are very unlikely to fully share the exact same queues with
one another. This way a bad caller can't fully starve any other caller.
Obviously this isn't perfect since at the end of the day everyone's still
sharing the same underlying disk/cpu/etc, but it's helped a bit in our testing.
We're still experimenting so didnt have something to actually upstream for
everyone, but that'd be a goal if we come away with something generally
applicable.
One might also ask why not just use quotas or other rate limiting, which is
actually what we've had bespoke built into our hbase fork for years and are
hoping to drop in our upgrade to hbase2. A big problem that has continually
cropped up there is how much minor tweaking we've needed to do to these rate
limits over time, because our environment is highly variable over time. This
system with the custom queue and custom balancer should require very little
tuning over time since it mostly just aims to enforce flexible fairness.
We're probably going to do a blog post with more details at some point in the
future, and maybe upstream parts of it if there's interest.
> Allow configuration of IPC queue balancer
> -----------------------------------------
>
> Key: HBASE-26703
> URL: https://issues.apache.org/jira/browse/HBASE-26703
> Project: HBase
> Issue Type: New Feature
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Minor
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> Currently we randomly assign IPC calls to queues using a RandomQueueBalancer,
> which relies on ThreadLocalRandom. I would like to make that configurable so
> that one can plug in their own queue balancer. This usefully combines with
> the existing ability to specify a pluggable queue type.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)