Andrew Kyle Purtell created HBASE-24528:
-------------------------------------------
Summary: Improve balancer decision observability
Key: HBASE-24528
URL: https://issues.apache.org/jira/browse/HBASE-24528
Project: HBase
Issue Type: New Feature
Components: Admin, Balancer, shell, UI
Reporter: Andrew Kyle Purtell
We provide detailed INFO and DEBUG level logging of balancer decision factors,
outcome, and reassignment planning, as well as similarly detailed logging of
the resulting assignment manager activity. However, an operator may need to
perform online and interactive observation, debugging, or performance analysis
of current balancer activity. Scraping and correlating the many log lines
resulting from a balancer execution is labor intensive and has a lot of latency
(order of ~minutes to acquire and index, order of ~minutes to correlate).
The balancer should maintain a rolling window of history, e.g. the last 100
region move plans, or last 1000 region move plans submitted to the assignment
manager. This history should include decision factor details and weights and
costs. The rsgroups balancer may be able to provide fairly simple decision
factors, like for example "this table was reassigned to that regionserver
group". The underlying or vanilla stochastic balancer on the other hand, after
a walk over random assignment plans, will have considered a number of cost
functions with various inputs (locality, load, etc.) and multipliers, including
custom cost functions. We can devise an extensible class structure that
represents explanations for balancer decisions, and for each region move plan
that is actually submitted to the assignment manager, we can keep the
explanations of all relevant decision factors alongside the other details of
the assignment plan like the region name, and the source and destination
regionservers.
This history should be available via API for use by new shell commands and
admin UI widgets.
The new shell commands and UI widgets can unpack the representation of balancer
decision components into human readable output.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)