Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=666624&r1=666623&r2=666624&view=diff ============================================================================== --- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original) +++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Wed Jun 11 04:39:55 2008 @@ -1068,33 +1068,109 @@ <p>Users/admins can also specify the maximum virtual memory of the launched child-task using <code>mapred.child.ulimit</code>.</p> - <p>When the job starts, the localized job directory - <code> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> - has the following directories: </p> + <p>The task tracker has local directory, + <code> ${mapred.local.dir}/taskTracker/</code> to create localized + cache and localized job. It can define multiple local directories + (spanning multiple disks) and then each filename is assigned to a + semi-random local directory. When the job starts, task tracker + creates a localized job directory relative to the local directory + specified in the configuration. Thus the task tracker directory + structure looks the following: </p> <ul> - <li> A job-specific shared directory, created at location - <code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/ </code>. - This directory is exposed to the users through - <code>job.local.dir </code>. The tasks can use this space as scratch - space and share files among them. The directory can accessed through + <li><code>${mapred.local.dir}/taskTracker/archive/</code> : + The distributed cache. This directory holds the localized distributed + cache. Thus localized distributed cache is shared among all + the tasks and jobs </li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> : + The localized job directory + <ul> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</code> + : The job-specific shared directory. The tasks can use this space as + scratch space and share files among them. This directory is exposed + to the users through the configuration property + <code>job.local.dir</code>. The directory can accessed through api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjoblocaldir"> JobConf.getJobLocalDir()</a>. It is available as System property also. - So,users can call <code>System.getProperty("job.local.dir")</code>; - </li> - <li>A jars directory, which has the job jar file and expanded jar </li> - <li>A job.xml file, the generic job configuration </li> - <li>Each task has directory <code>task-id</code> which again has the - following structure + So, users (streaming etc.) can call + <code>System.getProperty("job.local.dir")</code> to access the + directory.</li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</code> + : The jars directory, which has the job jar file and expanded jar. + The <code>job.jar</code> is the application's jar file that is + automatically distributed to each machine. It is expanded in jars + directory before the tasks for the job start. The job.jar location + is accessible to the application through the api + <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjar"> + JobConf.getJar() </a>. To access the unjarred directory, + JobConf.getJar().getParent() can be called.</li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</code> + : The job.xml file, the generic job configuration, localized for + the job. </li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</code> + : The task direcrory for each task attempt. Each task directory + again has the following structure : <ul> - <li>A job.xml file, task localized job configuration </li> - <li>A directory for intermediate output files</li> - <li>The working directory of the task. - And work directory has a temporary directory - to create temporary files</li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code> + : A job.xml file, task localized job configuration, Task localization + means that properties have been set that are specific to + this particular task within the job. The properties localized for + each task are described below.</li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code> + : A directory for intermediate output files. This contains the + temporary map reduce data generated by the framework + such as map output files etc. </li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code> + : The curernt working directory of the task. </li> + <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code> + : The temporary directory for the task. + (User can specify the property <code>mapred.child.tmp</code> to set + the value of temporary directory for map and reduce tasks. This + defaults to <code>./tmp</code>. If the value is not an absolute path, + it is prepended with task's working directory. Otherwise, it is + directly assigned. The directory will be created if it doesn't exist. + Then, the child java tasks are executed with option + <code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>. + Anp pipes and streaming are set with environment variable, + <code>TMPDIR='the absolute path of the tmp dir'</code>). This + directory is created, if <code>mapred.child.tmp</code> has the value + <code>./tmp</code> </li> </ul> </li> </ul> - + </li> + </ul> + + <p>The following properties are localized in the job configuration + for each task's execution: </p> + <table> + <tr><th>Name</th><th>Type</th><th>Description</th></tr> + <tr><td>mapred.job.id</td><td>String</td><td>The job id</td></tr> + <tr><td>mapred.jar</td><td>String</td> + <td>job.jar location in job directory</td></tr> + <tr><td>job.local.dir</td><td> String</td> + <td> The job specific shared scratch space</td></tr> + <tr><td>mapred.tip.id</td><td> String</td> + <td> The task id</td></tr> + <tr><td>mapred.task.id</td><td> String</td> + <td> The task attempt id</td></tr> + <tr><td>mapred.task.is.map</td><td> boolean </td> + <td>Is this a map task</td></tr> + <tr><td>mapred.task.partition</td><td> int </td> + <td>The id of the task within the job</td></tr> + <tr><td>map.input.file</td><td> String</td> + <td> The filename that the map is reading from</td></tr> + <tr><td>map.input.start</td><td> long</td> + <td> The offset of the start of the map input split</td></tr> + <tr><td>map.input.length </td><td>long </td> + <td>The number of bytes in the map input split</td></tr> + <tr><td>mapred.work.output.dir</td><td> String </td> + <td>The task's temporary output directory</td></tr> + </table> + + <p>The standard output (stdout) and error (stderr) streams of the task + are read by the TaskTracker and logged to + <code>${HADOOP_LOG_DIR}/userlogs</code></p> + <p>The <a href="#DistributedCache">DistributedCache</a> can also be used as a rudimentary software distribution mechanism for use in the map and/or reduce tasks. It can be used to distribute both jars and
Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml?rev=666624&r1=666623&r2=666624&view=diff ============================================================================== --- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml (original) +++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml Wed Jun 11 04:39:55 2008 @@ -167,6 +167,7 @@ <setmapoutputcompressiontype href="#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)" /> <setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" /> <getjoblocaldir href="#getJobLocalDir()" /> + <getjar href="#getJar()" /> </jobconf> <jobconfigurable href="JobConfigurable.html"> <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />
