Hi! My colleague and I have implemented a logging system that collects reports about Hadoop network traffic in a centralized "Statistic Server". We collect information about Mapper Inputs, Reducer Inputs and HDFS Writes at the transfer level, rather than the total number of bytes per task (which is what counters do currently). We originally aimed this at building a system which would be able to keep track of network performance in the cluster in real-time so that scheduling adjustments can be made on the fly (hence a centralized "Statistic Server" was created, but the system can also be easily used to log them locally on each machine by adjusting the XML configuration files). We eventually used this system for investigating the effects of network speed on job running time, particularly in the context of clusters deployed across the Internet.
We would like to gauge interest in the Hadoop community in this feature, as we would like to contribute this to the project. It is, mostly, aimed at research users (those who use Hadoop as a research platform, and also those who research the workings and performance of Hadoop itself - We are of the second category ourselves), although it might also be used by people who wish to analyze the data flow of the various stages of Hadoop computation in their jobs. In turn, this should enable a new way to discover possible optimizations for jobs. This has no effect on Hadoop when disabled, which, by default, it will be. Please let us know what/if we should elaborate further, if any interest exists. Thanks, Lev Faerman and Aviad Pines.