[jira] [Commented] (HBASE-24528) Improve balancer decision observability

Andrew Kyle Purtell (Jira) Wed, 10 Jun 2020 12:29:20 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132660#comment-17132660
 ]


Andrew Kyle Purtell commented on HBASE-24528:
---------------------------------------------

{quote}how we solve using workflow in HMaster vs RegionServer is where I am 
stuck currently.
{quote}
At a high level, we factor out a single common framework for logging to an 
in-memory ring buffer, the ring buffer impl, and RPC service supporting it. 
Support allocation of one or more _named_ ring buffers. A ring buffer can 
optionally also have write-behind persistence. The size of the ring buffer and 
the persistence option and its configuration can be constructor parameters. 
Make a builder API for added convenience. Put all of this into hbase-server. 
Then, both the master and regionserver implementations can import the framework 
and instantiate ring buffers with specific names. The regionserver has one 
named "slow_log". The master has one named "balancer", or maybe 
"balancer_plans".

Then, shell commands and UI widgets can use RPC for looking up a named ring 
buffer and enumerating its contents (or its backing persistence) to show what 
it must.

What is put into a ring buffer can be anything, but we presume it is some 
protobuf encoded structure for ease of serialization and deserialization and 
cross version compatibility.

> Improve balancer decision observability
> ---------------------------------------
>
>                 Key: HBASE-24528
>                 URL: https://issues.apache.org/jira/browse/HBASE-24528
>             Project: HBase
>          Issue Type: New Feature
>          Components: Admin, Balancer, Operability, shell, UI
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> We provide detailed INFO and DEBUG level logging of balancer decision 
> factors, outcome, and reassignment planning, as well as similarly detailed 
> logging of the resulting assignment manager activity. However, an operator 
> may need to perform online and interactive observation, debugging, or 
> performance analysis of current balancer activity. Scraping and correlating 
> the many log lines resulting from a balancer execution is labor intensive and 
> has a lot of latency (order of ~minutes to acquire and index, order of 
> ~minutes to correlate). 
> The balancer should maintain a rolling window of history, e.g. the last 100 
> region move plans, or last 1000 region move plans submitted to the assignment 
> manager. This history should include decision factor details and weights and 
> costs. The rsgroups balancer may be able to provide fairly simple decision 
> factors, like for example "this table was reassigned to that regionserver 
> group". The underlying or vanilla stochastic balancer on the other hand, 
> after a walk over random assignment plans, will have considered a number of 
> cost functions with various inputs (locality, load, etc.) and multipliers, 
> including custom cost functions. We can devise an extensible class structure 
> that represents explanations for balancer decisions, and for each region move 
> plan that is actually submitted to the assignment manager, we can keep the 
> explanations of all relevant decision factors alongside the other details of 
> the assignment plan like the region name, and the source and destination 
> regionservers. 
> This history should be available via API for use by new shell commands and 
> admin UI widgets.
> The new shell commands and UI widgets can unpack the representation of 
> balancer decision components into human readable output. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24528) Improve balancer decision observability

Reply via email to