Hi Ashish,
I believe that Ari committed two instrumentation classes,
TaskTrackerInstrumentation and JobTrackerInstrumentation, (both in
src/mapred/org/apache/hadoop/mapred) that can give you information on
when components of your M/R jobs start and stop. I'm in the process of
writing some additional instrumentation APIs that collect timing
information about the RPC and HDFS layers, and will hopefully be able to
submit a patch in a few weeks.
Thanks,
George
Ashish Venugopal wrote:
Are you interested in simply profiling your own code (in which case you can
clearly use what ever java profiler you want), or your construction of the
MapReduce job, ie how much time is being spent in the Map vs the sort vs
the shuffle vs the Reduce. I am not aware of a good solution to the second
problem, can anyone comment?
Ashish
On Wed, Oct 8, 2008 at 12:06 PM, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
Just run your map reduce job local and connect your profiler. I use
yourkit.
Works great!
You can profile your map reduce job running the job in local mode as ant
other java app as well.
However we also profiled in a grid. You just need to install the yourkit
agent into the jvm of the node you want to profile and than you connect to
the node when the job runs.
However you need to time things well, since the task jvm is shutdown as
soon your job is done.
Stefan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc., Menlo Park, California
web: http://www.101tec.com
blog: http://www.find23.net
On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote:
Hi!
I've developed a Map/Reduce algorithm to analyze some logs from web
application.
So basically, we are ready to start QA test phase, so now, I would like to
now how efficient is my application
from performance point of view.
So is there any procedure I could use to do some profiling?
Basically I need basi data, like time excecution or code bottlenecks.
Thanks in advance.
-- Gerardo Velez
--
George Porter, Sun Labs/CTO
Sun Microsystems - San Diego, Calif.
[EMAIL PROTECTED] 1.858.526.9328