Hi, I am looking for information in the area of Hadoop tracing, instrumentation, benchmarking and so forth. What utilities exist ? What's their maturity? Where can I get more info about them ?
I am curious about statistics on Hadoop behavior (per a typical workload ? different workloads ?). I am thinking on various metrics such as - Percentage of time a Hadoop job spends on the various phases (map, sort & shuffle, reduce), on I/O, network, framework execution time, user code execution time ... Known bottlenecks ? And whatever else interesting statistics. Has anyone already measured ? Any documented statistics out there ? I already encountered various stuff like the X-trace based tracing tool from Berkeley, Hadoop metrics API, Hadoop instrumentation API (HADOOP-3772), Hadoop Vaidya (HADOOP-4179), gridmix benchmark. Does anyone have an input on any of those ? Anything else I missed ? Thanks for any direction, Naama -- oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales." (Albert Einstein)
