[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley

Apache Wiki Sun, 17 Jun 2007 22:54:17 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  
  In order to debug Pipes programs you need to keep the downloaded commands. 
  
- First, to keep the !TaskTracker from deleting the files when the task is 
finished, you need to set either keep.failed.task.files (set to true if the 
task you want to debug fails) or keep.task.files.pattern (set to a regex of the 
task name you want to debug).
+ First, to keep the !TaskTracker from deleting the files when the task is 
finished, you need to set either keep.failed.task.files (set it to true if the 
interesting task always fails) or keep.task.files.pattern (set to a regex that 
includes the interesting task name).
  
- Second, your job should set hadoop.pipes.command-file.keep to true in the 
JobConf. This will cause all of the tasks in the job to write their command 
stream to a file in the working directory named downlink.data. This file will 
contain the JobConf, the task information, and the task input, so it may be 
large. But it provides enough information that your executable will run without 
any interaction with the framework. 
+ Second, your job should set hadoop.pipes.command-file.keep to true in the 
!JobConf. This will cause all of the tasks in the job to write their command 
stream to a file in the working directory named downlink.data. This file will 
contain the JobConf, the task information, and the task input, so it may be 
large. But it provides enough information that your executable will run without 
any interaction with the framework. 
  
  Third, go to the host where the problem task ran, go into the work directory 
and
  {{{
  setenv hadoop.pipes.command.file downlink.data
  }}}
- and run your executable under the debugger or valgrind. It will run as if the 
framework was feeding it commands and data and produce a output file 
downlink.data.out with the binary commands that it would have sent up to the 
framework. I guess eventually, I should have the output file be written in text 
rather than binary...
+ and run your executable under the debugger or valgrind. It will run as if the 
framework was feeding it commands and data and produce a output file 
downlink.data.out with the binary commands that it would have sent up to the 
framework. Eventually, I'll probably make the downlink.data.out file into a 
text-based format, but for now it is binary. Most problems however, will be 
pretty clear in the debugger or valgrind, even without looking at the generated 
data.

[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley

Reply via email to