> > Parfait can poll JMX counters, or counters can be invoked direct. I'm > working on a MetricContext that exports all HBase and Hadoop JMX counters > into Parfait. The goal is to be able to have PCP visualize data more > effectively for HBase/Hadoop clusters. To give an example of what sort of > visualization I'd love to have for HBase & Hadoop see a simple working pic of > 3d visualisation at [4] below, that's basic, but imagine a 3D vis of all the > HBase region servers showing visualizations of Hbase specific metrics, played > back in real time, or retrospectively at any pace you want. >
btw we also export all the JVM metrics here too, GC activity (rates, times spent, for both major and minor GC's), class compilations, memory segment sizes (heap, perm gen, code area etc). if HBase metrics like compactions and splits etc were exported into PCP one could see the impact across hardware (cpu, Virtual memory, disk) with JVM level stuff (heap sizes and GC) correlating with HBase activity. Parfait can also collect metrics on a per-thread (ThreadLocal) to allow individual request collection. For example, right now in production we can see for every request (a Controller/Servlet) this sort of data in our log files: [2010-04-07 11:06:28,569 INFO ][EventMetricCollector][http-2001-Processor85 g7pfur4y][59.167.192.26][228349] Top ViewCorrespondenceControl ViewCorrespondenceControl Elapsed time: own 3113ms, total 3117ms Total CPU: own 10ms, total 30ms User CPU: own 10ms, total 20ms System CPU: own 10ms, total 10ms Blocked count: own 0, total 0 Blocked time: own 0ms, total 0ms Wait count: own 0, total 0 Wait time: own 0ms, total 0ms Database execution time: own 3050ms, total 3050ms Database execution count: own 12, total 12 total Database CPU time: own 0, total 0 Error Pages: own 0, total 1 I'd hope that through a similar mechanism I could instrument the HBase Scan costs of a particular User activity and see how many rows were read over, and how many cell values were picked out for a single request. This allows us to narrow in quickly on which activity (controller action) or which users are consuming the most of a certain resource, find out why and fix it. We import our PCP data into a datawarehouse for longer term capacity planning too. anyway, some more ideas to kick around and discuss. Paul