Hi Sheesha Basically for benchmarking purposes there would be multiple options available. We basically use job tracker metrics pretty much available from the job tracker web UI to capture the map reduce statistics like -Timings for atomic levels like map,sort and shuffle,reduce as well as execution timing on high levels. This can help understanding where your job is expensive and fine tune at a more granular level - Data sizes and records counts, reads and writes to and from hdfs and lfs and so on The resource utilization metrics like CPU usage, memory usage, memory swaps,io usage etc would be captured from Ganglia.
Regards Bejoy.K.S On Thu, Jan 5, 2012 at 8:51 PM, Sesha Kumar <sesha...@gmail.com> wrote: > Hi guys, > Am trying to implement some solutions for small file problem in hdfs as > part of my project work. > I got my own set of files stored in my hadoop cluster. > I need a tool or method to test and establish benchmarks for > 1. memory, performance of read and write operations etc > 2. performance of mapreduce jobs > on the stored files. The tool must also take regard of my solution when > establishing above benchmarks. > Please suggest some tools for the above purpose. ( for example, those used > for establishing research benchmarks) > Thanks in advance! >