A tracing framework for Hadoop

Matei Zaharia Fri, 30 Nov 2007 21:48:44 -0800

Hi,

We're grad students at UC Berkeley working on a project to instrumentHadoop using an open-source path-based tracing framework called X-Trace (www.x-trace.net/wiki). X-Trace captures causal dependenciesbetween events in addition to timings, letting developers analyze notjust performance but also context and dependencies for various events.We have created a web-based trace analysis UI that shows performanceof different IPC calls, DFS operations, and phases of a MapReduce job.The goal is to let users easily spot the origin of unusual behavior ina running system at a centralized location. We believe that this kindof tracing can be used for performance tuning and debugging in bothdevelopment and production environments.

We'd like to get feedback on our work and suggestions on what traceanalyses would be useful to Hadoop developers and users. Some of thereports we currently generate include machine utilization over time,relative performance of different tasks, and performance of DFSoperations. You can see an example set of reports at http://www.cs.berkeley.edu/~matei/xtrace_sample_task.html(this is a trace of a Nutch indexing job). You can also read ourproject journal at http://radlab.cs.berkeley.edu/wiki/Projects/Monitoring_Hadoop_through_Tracing. We've already spotted some interesting issues, like map tasks andDFS reads/writes that are an order of magnitude slower than theaverage, and we are investigating possible causes for them. Mostimportantly, the UI lets a user easily see where the system isspending time and reason about how to tune it, and provides much moreinformation than the progress data in the JobTracker UI. As a Hadoopdeveloper, what kinds of questions do you want answered about runningjobs that are hard to obtain just from process logs?

Once we've had a discussion on features for a trace analysis UI, wewould like to contribute our work into the Hadoop codebase. We willcreate a JIRA issue and patch adding this functionality. We're alsointerested in seeing if we can integrate X-Trace logging more tightlywith the current Apache logging in Hadoop.

Finally, we are currently experimenting on relatively small (<50nodes) clusters here at Berkeley, but we would really like to trytracing some large (>1000 node) clusters. If there is someoneinterested in evaluating performance on such a cluster, we would bevery happy to talk about how to set up X-Trace and provide you with apatch.


Thanks,

Andy Konwinski and Matei Zaharia

A tracing framework for Hadoop

Reply via email to