> 
> Parfait can poll JMX counters, or counters can be invoked direct.  I'm 
> working on a MetricContext that exports all HBase and Hadoop JMX counters 
> into Parfait.  The goal is to be able to have PCP visualize data more 
> effectively for HBase/Hadoop clusters. To give an example of what sort of 
> visualization I'd love to have for HBase & Hadoop see a simple working pic of 
> 3d visualisation  at [4] below, that's basic, but imagine a 3D vis of all the 
> HBase region servers showing visualizations of Hbase specific metrics, played 
> back in real time, or retrospectively at any pace you want.
> 

btw we also export all the JVM metrics here too, GC activity (rates, times 
spent, for both major and minor GC's), class compilations, memory segment sizes 
(heap, perm gen, code area etc).

if HBase metrics like compactions and splits etc were exported into PCP one 
could see the impact across hardware (cpu, Virtual memory, disk) with JVM level 
stuff (heap sizes and GC) correlating with HBase activity.  

Parfait can also collect metrics on a per-thread (ThreadLocal) to allow 
individual request collection.  For example, right now in production we can see 
for every request (a Controller/Servlet) this sort of data in our log files:

[2010-04-07 11:06:28,569 INFO ][EventMetricCollector][http-2001-Processor85 
g7pfur4y][59.167.192.26][228349] Top        ViewCorrespondenceControl       
ViewCorrespondenceControl               Elapsed time: own 3113ms, total 3117ms  
Total CPU: own 10ms, total 30ms User CPU: own 10ms, total 20ms  System CPU: own 
10ms, total 10ms        Blocked count: own 0, total 0   Blocked time: own 0ms, 
total 0ms        Wait count: own 0, total 0      Wait time: own 0ms, total 0ms  
 Database execution time: own 3050ms, total 3050ms       Database execution 
count: own 12, total 12      total Database CPU time: own 0, total 0 Error 
Pages: own 0, total 1

I'd hope that through a similar mechanism I could instrument the HBase Scan 
costs of a particular User activity and see how many rows were read over, and 
how many cell values were picked out for a single request.  This allows us to 
narrow in quickly on which activity (controller action) or which users are 
consuming the most of a certain resource, find out why and fix it.  We import 
our PCP data into a datawarehouse for longer term capacity planning too.

anyway, some more ideas to kick around and discuss.

Paul

Reply via email to