[ 
https://issues.apache.org/jira/browse/HBASE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-11062:
--------------------------------------

    Assignee: Andrew Purtell

Consider something like NNtop (HDFS-6982) and YARN top (YARN-3348).

Features in common:
- Command line utility
- Unix top-like presentation (curses-like interface)
- Cluster health in display header
- Summary utilization metrics
- Windowed data collection
- Cache data for short periods of time

I think our approach would look a lot like HDFS's. All of the necessary 
information to make NNtop, as you might expect, is collected and exported by a 
singleton process. We can do something similar with our master. All 
regionservers already periodically report load statistics to the master, this 
is what populates data returned by Admin#getClusterStatus. We'd augment the 
regionserver reports with top-K usage stats. Like HDFS-6982, we'd collect and 
manage the information as a MetricsSource implementation, thus exposing it by 
JMX and HTTP for use by a CLI tool. See the patch on HDFS-6982 for as sketch of 
what a patch for our master might look like.

Views to be presented by the CLI tool:
* Status header: master uptime, live servers, dead servers, aggregate ops/sec
* Default (Table oriented)
** By table, drill down to region
** By user identity
** By client location
* Namespace
** By namespace, drill down to table
** By user identity
** By client location
* Region
** By column family, drill down to CF
** By key
** By operation type
** By user identity
** By client location
* Column family
** By key
** By operation type
** By user identity
** By client location

Columns in the views:
* Primary sort order
* Secondary sort order
* Total access count per second
* Summary access latency in ms, (avg, p75, p90, p95, p99, max, adjustable with 
keypress)
* Data volume (display unit adjustable with keypress)

Sort ordering:
||View||Primary||Secondary||
|Table (default view)|Table|Region|
|Table by user|User|Table|
|Table by client|Client|Table|
|Namespace|Namespace|Table|
|Namespace by user|User|Namespace|
|Namespace by client|Client|Namespace|
|Region by CF (default region view)|Region|CF|
|Region by key|Key|Region|
|Region by operation|Op type|Region|
|Region by user|User|Region|
|Region by client|Client|Region|
|CF by key (default CF view)|Key|CF|
|CF by operation|Op type|CF|
|CF by user|User|CF|
|CF by client|Client|CF|

Where not sorting by operation type we should separate op count and latencies 
for read and write operations into their own columns.

In most views the contents of the secondary sort field won't change.

Interesting future ideas:
* Monitored tasks (HBASE-4349)
* Read replica awareness

What else?

> htop
> ----
>
>                 Key: HBASE-11062
>                 URL: https://issues.apache.org/jira/browse/HBASE-11062
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> A top-like monitor could be useful for testing, debugging, operations of 
> clusters of moderate size, and possibly for diagnosing issues in large 
> clusters.
> Consider a curses interface like the one presented by atop 
> (http://www.atoptool.nl/images/screenshots/genericw.png) - with aggregate 
> metrics collected over a monitoring interval in the upper portion of the 
> pane, and a listing of discrete measurements sorted and filtered by various 
> criteria in the bottom part of the pane. One might imagine a cluster overview 
> with cluster aggregate metrics above and a list of regionservers sorted by 
> utilization below; and a regionserver view with process metrics above and a 
> list of metrics by operation type below, or a list of client connections, or 
> a list of threads, sorted by utilization, throughput, or latency. 
> Generically 'htop' is taken but would be distinctive in the HBase context, a 
> utility org.apache.hadoop.hbase.HTop
> No need necessarily for a curses interface. Could be an external monitor with 
> a web front end as has been discussed before. I do like the idea of a process 
> that runs in a terminal because I interact with dev and test HBase clusters 
> exclusively by SSH. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to