Hi,

I am looking for information in the area of Hadoop tracing, instrumentation,
benchmarking and so forth.
What utilities exist ? What's their maturity? Where can I get more info
about them ?

I am curious about statistics on Hadoop behavior (per a typical workload ?
different workloads ?). I am thinking on various metrics such as -
Percentage of  time a Hadoop job spends on the various phases (map, sort &
shuffle, reduce), on I/O, network, framework execution time, user code
execution time ...
Known bottlenecks ?
And whatever else interesting statistics.

Has anyone already measured ? Any documented statistics out there ?

I already encountered various stuff like the X-trace based tracing tool from
Berkeley, Hadoop metrics API, Hadoop instrumentation API (HADOOP-3772),
Hadoop Vaidya (HADOOP-4179), gridmix benchmark.

Does anyone have an input on any of those ?
Anything else I missed ?

Thanks for any direction,
Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Reply via email to