3D Cluster Performance Visualization

Paul Smith Fri, 25 Sep 2009 01:18:57 -0700

Hi,

I'm still relatively new to Hadoop here, so bear with me. We have afew ex-SGI staff with us, and one of the tools we now use at Aconex isPerformance Co-Pilot (PCP), which is an open-source PerformanceMonitoring suite out of Silicon Graphics (see [1]). SGI are a bitfond of large-scale problems and this toolset was built to supporttheir own monster computers (see [2] for one of their clients, yep,that's one large single computer), and PCP was used to monitor andtune that, so I'm pretty confident it has the credentials to help withHadoop.

Aconex has built a Java bridge to PCP and has open-sourced that asParfait (see [3]). We rely on this for real-time and post-problemretrospective analysis. We would be dead in the water without it. Bybeing able to combine hardware and software metrics across multiplemachines into a single warehouse of data we can correlate manyinteresting things and solve problems very quickly.

Now I want to unleash this on Hadoop. I have written a MetricContextextension that uses the bridge, and I can export counters and valuesto PCP for the namenode, datanode, jobtracker and tasktracker. We arebuilding some small tool extensions to allow 3D visualization. Firstfledgling view of what it looks like is here:


http://people.apache.org/~psmith/clustervis.png

Yes, a pretty trivial cluster at the moment, but the toolset allowspretty simple configurations to create the cluster by passing it themasters/slaves file. Once PCP tools connects to each node through myimplementation of PCP Metric Context it can find out whether it's anamenode, or a jobtracker etc and display it differently. We hope toimprove on the tools to utilise the DNSToSwitchMapping style to thenvisualize all the nodes within the cluster as they would appear in therack. PCP already has support for Cisco switches so we can alsointegrate those into the picture and display inter-rack networkingvolumes. The real payoff here is the retrospective analysis, all thisPCP data is collected into Archives so this view can be replayed atany time, and at any pace you want. Very interesting problems arefound when you have that sort of tool.

I guess my question is whether anyone else thinks this is going to beof value to the wider Hadoop community? Obviously we do, but we'renot exactly stretching Hadoop just yet, nor do we fully understandsome of the tricky performance problems large Hadoop cluster adminsface. I think we'd love to think we could add this to the hadoop-contrib though hoping others might find it useful.

So if anyone is interested in asking questions or suggesting crucialfeature sets we'd appreciate it.


cheers (and thanks for getting this far in the email.. :) )

Paul Smith
psmith at aconex.com
psmith at apache.org

[1] Performance Co-Pilot (PCP)
http://oss.sgi.com/projects/pcp/index.html

[2] NASAs 'Columbia' computer
http://www.nas.nasa.gov/News/Images/images.html

[3] Parfait
http://code.google.com/p/parfait/

3D Cluster Performance Visualization

Reply via email to