[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by Amareshwari

Apache Wiki Fri, 28 Sep 2007 01:04:14 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by Amareshwari:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  
  == Run a debug script when Task fails ==
  
- A facility is provided, via user-provided scripts, for doing post-processing 
on task logs, task's stdout, stderr, core file. There is a default script which 
processes core dumps under gdb and prints stack trace. The last five lines from 
stdout and stderr of debug script are printed on the diagnostics. These outputs 
are displayed job UI on demand. 
+ A facility is provided, via user-provided scripts, for doing post-processing 
on task logs, task's stdout, stderr, core file. There is a default script which 
processes core dumps under gdb and prints stack trace. The last five lines from 
stdout and stderr of debug script are printed on the diagnostics. These outputs 
are displayed on job UI on demand. 
  
  == How to submit debug command ==
  
@@ -74, +74 @@

  
  For examples command can be 'myScript @stderr@'. This command has executable 
myScript. And myScript processes failed task's stderr.
  
- The debug command can be a gdb command where user can submit a command file 
to execute using -x. 
+ The debug command can be a gdb command where user can submit a command file 
to execute using -x option. 
- Then debug command can look like 'gdb <program-name> -c @core@ -x <cmd-fle> 
'. This command processes core file of the failed task <program-name> and 
executes commands in <cmd-file>
+ Then debug command can look like 'gdb <program-name> -c @core@ -x 
<gdb-cmd-fle> '. This command processes core file of the failed task 
<program-name> and executes commands in <gdb-cmd-file>. Please make sure gdb 
command file has 'quit' in its last line.
  
  == How to submit debug script ==
  
  To submit the debug script file, first put the file in dfs.
  
- Set the property "mapred.cache.executables" with value 
<path>#<executable-name>. Executable property can also be set by APIs 
DistributedCache.addCacheExecutable(URI,conf) and 
DistributedCache.setCacheExecutables(URI[],conf) where URI is of the form 
"hdfs://host:port/<path>#<executable-name>". For Streaming executable can be 
added through -cacheExecutable URI.
+ The executable can be added by setting the property 
"mapred.cache.executables" with value <path>#<executable-name>. For more than 
one executable, they can be added as comma seperated executable paths. 
+ Executable property can also be set by APIs 
DistributedCache.addCacheExecutable(URI,conf) and 
DistributedCache.setCacheExecutables(URI[],conf) where URI is of the form 
"hdfs://host:port/<path>#<executable-name>".
+ For Streaming, the executable can be added through -cacheExecutable URI.
  
- For gdb, command file need not be executable. The command file needs to be in 
dfs. It can be added to cache by setting the property "mapred.cache.files" with 
the value <path>#<cmd-file> or through the API 
DistribuedCache.addCacheFile(URI,conf).
+ For gdb, the gdb command file need not be executable. But, the command file 
needs to be in dfs. It can be added to cache by setting the property 
"mapred.cache.files" with the value <path>#<cmd-file> or through the API 
DistribuedCache.addCacheFile(URI,conf).
  Please make sure the property "mapred.create.symlink" is set to "yes"
  
  = How to debug Hadoop Pipes programs =

[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by Amareshwari

Reply via email to