That code is in, unfortunately it doesn't quite solve the problem; you'd need to do some more work. You'd have to write subclasses that spit out the statistics you want. Then set the appropriate options in hadoop-site, so that those classes get loaded.
On Wed, Oct 8, 2008 at 12:30 PM, George Porter <[EMAIL PROTECTED]> wrote: > Hi Ashish, > > I believe that Ari committed two instrumentation classes, > TaskTrackerInstrumentation and JobTrackerInstrumentation, (both in > src/mapred/org/apache/hadoop/mapred) that can give you information on when > components of your M/R jobs start and stop. I'm in the process of writing > some additional instrumentation APIs that collect timing information about > the RPC and HDFS layers, and will hopefully be able to submit a patch in a > few weeks. > > Thanks, > George > > Ashish Venugopal wrote: >> >> Are you interested in simply profiling your own code (in which case you >> can >> clearly use what ever java profiler you want), or your construction of the >> MapReduce job, ie how much time is being spent in the Map vs the sort vs >> the shuffle vs the Reduce. I am not aware of a good solution to the second >> problem, can anyone comment? >> >> Ashish >> >> On Wed, Oct 8, 2008 at 12:06 PM, Stefan Groschupf <[EMAIL PROTECTED]> wrote: >> >> >>> >>> Just run your map reduce job local and connect your profiler. I use >>> yourkit. >>> Works great! >>> You can profile your map reduce job running the job in local mode as ant >>> other java app as well. >>> However we also profiled in a grid. You just need to install the yourkit >>> agent into the jvm of the node you want to profile and than you connect >>> to >>> the node when the job runs. >>> However you need to time things well, since the task jvm is shutdown as >>> soon your job is done. >>> Stefan >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> 101tec Inc., Menlo Park, California >>> web: http://www.101tec.com >>> blog: http://www.find23.net >>> >>> >>> >>> >>> On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote: >>> >>> Hi! >>> >>>> >>>> I've developed a Map/Reduce algorithm to analyze some logs from web >>>> application. >>>> >>>> So basically, we are ready to start QA test phase, so now, I would like >>>> to >>>> now how efficient is my application >>>> from performance point of view. >>>> >>>> So is there any procedure I could use to do some profiling? >>>> >>>> >>>> Basically I need basi data, like time excecution or code bottlenecks. >>>> >>>> >>>> Thanks in advance. >>>> >>>> -- Gerardo Velez >>>> >>>> >>> >>> >> >> > > -- > George Porter, Sun Labs/CTO > Sun Microsystems - San Diego, Calif. > [EMAIL PROTECTED] 1.858.526.9328 > > -- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
