that's the bit that's 'coming'.  

In theory, the plan will look a bit like this:

* install PCP (should be rpm's/debian packages available here: 
ftp://oss.sgi.com/projects/pcp/download/) on all nodes in the cluster
* download the PCP Glider (http://oss.sgi.com/projects/pcp/pcp-gui.html) UI 
tools, you can just install that on your own desktop, or wherever you wish to 
run them from
* drop in the jar I'll provide (hopefully RSN) in the hbase/lib area for all 
nodes in the cluster (so, drop in hbase/lib presumably)
* mod the hadoop-metrics.properties to specify the new PCPMetricContext
* fire up HBase

at this point you can point the PCP client tools (pmchart, pmdumptext, etc) at 
any and all nodes to pull hardware, os, java, hbase/hadoop metrics out.

* run a script that we'll provide that is pointed at the 
hbase/conf/regionservers, which will convert the topology into canned 
configurations for the visualizers.  This step is just to get a basic 
known-good working viz of the cluster, but one could in theory point any of the 
tools at any or all of the nodes in the cluster and cherry pick what metrics 
you wanted to look at.

For retrospective logging/archive purposes there's an additional few steps just 
to configure which metrics you want to log, how frequently, but that's pretty 
simple.

I'm really hoping to be able to provide the jar soon, and some good steps for 
someone to try out, but honestly I would recommend just grabbing the base PCP 
packages on the cluster, because I think you'll find that monitoring the base 
hardware and OS of the cluster is very interesting and useful.

I don't like talking vapourware, I'm really sorry I haven't completed this in a 
form I'm comfortable sharing in more detail, but if you can just bear with me a 
bit longer. 

If anyone has any more questions about what it might/could do fire away, I'd 
like to discuss what in an ideal world you'd like to have in cluster 
monitoring/retrospective analysis so I can use these concrete cases to show 
where/how this setup would be of high value.

Paul

On 08/04/2010, at 2:20 PM, Stack wrote:

> On Tue, Apr 6, 2010 at 6:10 PM, Paul Smith <psm...@aconex.com> wrote:
>>> 
>>> 
> ...
>> anyway, some more ideas to kick around and discuss.
>> 
> 
> What do we have to do to get it running on one of our clusters Paul?
> St.Ack

Reply via email to