[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley

Apache Wiki Wed, 09 Aug 2006 15:54:39 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

New page:
= How to Debug Map/Reduce Programs =

Debugging distributed programs is always difficult, because very few debuggers 
will let you connect to a remote program that wasn't run with the proper 
command line arguments.

 1. Start by getting everything running (likely on a small input) in the local 
runner. 
    You do this by setting your job tracker to "local" in your config. The 
local runner can run 
    under the debugger and runs on your development machine.

 2. Run the small input on a 1 node cluster. This will smoke out all of the 
issues that happen with
    distribution and the "real" task runner, but you only have a single place 
to look at logs. Most 
    useful are the task and job tracker logs. Make sure you are logging at the 
INFO level or you will 
    miss clues like the output of your tasks.

 3. Run on a big cluster. Recently, I added the keep.failed.task.files config 
variable that tells the
    system to keep files for tasks that fail. This leaves "dead" files around 
that you can debug with. 
    On the node with the failed task, go to the task tracker's local directory 
and cd to
    ''<local>''/taskTracker/''<taskid>'' and run
    {{{
% hadoop org.apache.hadoop.IsolationRunner job.xml
    }}}
   This will run the failed task in a single jvm, which can be in the debugger, 
over precisely the same    
   input.

There is also a configuration variable (keep.task.files.pattern) that will let 
you specify a task to keep by name, even if it doesn't fail. Other than that, 
logging is your friend.

[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley

Reply via email to