[
https://issues.apache.org/jira/browse/HBASE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138701#comment-17138701
]
Viraj Jasani commented on HBASE-24528:
--------------------------------------
{quote}That means, we could use an in memory queue, disruptor is also fine(but
maybe a bit overkill?), to cache the records we want to save to the system
table. There will be a background task to flush the records. If the queue is
full we just drop the new records.
{quote}
All of these actions are implemented for Admin API getSlowLogResponses() except
for the last one. Instead of dropping new records if the queue is full, we will
be removing oldest inserted record from the head of the queue and add new
record at the tail. EvictingQueue is taking care of this.
Now we need this entire workflow as service (framework) to be consumed by
multiple clients.
HBASE-23938 uses system table for persistence if related config is turned on
for slow log, however enabling writes to system table for each named queue
(created dynamically) might be a challenge. For framework, we can create
namedQueues dynamically and provide Add and Get APIs for respective queues
(queue per use-case) but we might not want to create system tables for
persistence of each namedQueues records.
If we just use one system table for all named queues, we will encounter
indexing issue i.e fetch all records of a specific use-case, specific client.
On the other hand, if we have system table for each use-case, that might be too
many system tables with growing use-cases.
> Improve balancer decision observability
> ---------------------------------------
>
> Key: HBASE-24528
> URL: https://issues.apache.org/jira/browse/HBASE-24528
> Project: HBase
> Issue Type: New Feature
> Components: Admin, Balancer, Operability, shell, UI
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> We provide detailed INFO and DEBUG level logging of balancer decision
> factors, outcome, and reassignment planning, as well as similarly detailed
> logging of the resulting assignment manager activity. However, an operator
> may need to perform online and interactive observation, debugging, or
> performance analysis of current balancer activity. Scraping and correlating
> the many log lines resulting from a balancer execution is labor intensive and
> has a lot of latency (order of ~minutes to acquire and index, order of
> ~minutes to correlate).
> The balancer should maintain a rolling window of history, e.g. the last 100
> region move plans, or last 1000 region move plans submitted to the assignment
> manager. This history should include decision factor details and weights and
> costs. The rsgroups balancer may be able to provide fairly simple decision
> factors, like for example "this table was reassigned to that regionserver
> group". The underlying or vanilla stochastic balancer on the other hand,
> after a walk over random assignment plans, will have considered a number of
> cost functions with various inputs (locality, load, etc.) and multipliers,
> including custom cost functions. We can devise an extensible class structure
> that represents explanations for balancer decisions, and for each region move
> plan that is actually submitted to the assignment manager, we can keep the
> explanations of all relevant decision factors alongside the other details of
> the assignment plan like the region name, and the source and destination
> regionservers.
> This history should be available via API for use by new shell commands and
> admin UI widgets.
> The new shell commands and UI widgets can unpack the representation of
> balancer decision components into human readable output.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)