[Hadoop Wiki] Update of "HowToDebugMapReducePrograms" by AmareshwariSriRamadasu

Apache Wiki Wed, 20 Feb 2008 22:52:49 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by AmareshwariSriRamadasu:
http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  
  = Run a debug script when Task fails =
  
- A facility is provided, via user-provided scripts, for doing post-processing 
on task logs, task's stdout, stderr, syslog. For pipes, a default script is run 
which processes core dumps under gdb, prints stack trace and gives info about 
running threads. The stdout and stderr of debug script are printed on the 
diagnostics. These outputs are displayed on job UI on demand. 
+ When map/reduce task fails, there is a facility provided, via user-provided 
scripts, for doing post-processing on task logs i.e task's stdout, stderr, 
syslog.  The stdout and stderr of the user-provided debug script are printed on 
the diagnostics. These outputs are displayed on job UI on demand.
+ 
+ For pipes, a default script is run which processes core dumps under gdb, 
prints stack trace and gives info about running threads.
+ 
+ In the following sections we discuss how to submit debug script along with 
the job. We also discuss what the default behavior is.
+ For submiting debug script, first it has to distributed. Then the script has 
to supplied in Configuration.
+ 
+ == How to submit debug script file ==
+ 
+ To submit the debug script file, first put the file in dfs. 
+ 
+ The file can be distributed by setting the property "mapred.cache.files" with 
value <path>#<script-name>. For more than one file, they can be added as comma 
seperated paths.
+ The script file needs to be symlinked.
+ 
+ This property can also be set by APIs 
+ 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)
 DistributedCache.addCacheFile(URI,conf)] and 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles
 DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form 
"hdfs://host:port/<absolutepath>#<script-name>".
+ For Streaming, the file can be added through command line option -cacheFile.
+ To create symlink for the file, the property "mapred.create.symlink" is set 
to "yes". This can also be set by 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)
 DistributedCache.createSymLink]
  
  == How to submit debug script ==
  
- A quick way to set debug script is to set the properties 
"mapred.map.task.debug.script" and "mapred.reduce.task.debug.script" for 
debugging map task and reduce task respectively. These properties can also be 
set by APIs JobConf.setMapDebugScript and JobConf.setReduceDebugScript.
+ A quick way to submit debug script is to set values for the properties 
"mapred.map.task.debug.script" and "mapred.reduce.task.debug.script" for 
debugging map task and reduce task respectively. These properties can also be 
set by APIs 
[http://hadoop.apache.org/core/api/org/apache/hadoop/mapred/JobConf.html#setMapDebugScript(java.lang.String)
 JobConf.setMapDebugScript]
- 
+ 
[http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/mapred/JobConf.html#setReduceDebugScript(java.lang.String)
 JobConf.setReduceDebugScript].
  The script is given task's stdout, stderr, syslog, jobconf files as arguments.
  The debug command, run on the node where the map/reduce  failed, is:
  
@@ -96, +113 @@

  
  {{{ $script $stdout $stderr $syslog $jobconf $program }}}
  
- 
- To submit the debug script file, first put the file in dfs. 
- 
- The file can be distributed by setting the property "mapred.cache.files" with 
value <path>#<script-name>. For more than one file, they can be added as comma 
seperated paths.
- The script file needs to be symlinked.
- 
- This property can also be set by APIs 
- 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)
 DistributedCache.addCacheFile(URI,conf)] and 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles
 DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form 
"hdfs://host:port/<absolutepath>#<script-name>".
- For Streaming, the file can be added through command line option -cacheFile.
- To create symlink for the file, the property "mapred.create.symlink" is set 
to "yes". This can also be set by 
[http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)
 DistributedCache.createSymLink]
- 
  Here is an example on how to submit a script 
  {{{
      jobConf.setMapDebugScript("./myscript");
@@ -115, +121 @@

  }}}
  
  == Default Behavior ==
+ The default behavior for failed map/reduce tasks is 
  
  For Java programs:
  Stdout, stderr are shown on job UI. Stack trace is printed on diagnostics.

[Hadoop Wiki] Update of "HowToDebugMapReducePrograms" by AmareshwariSriRamadasu

Reply via email to