Modified: lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=610135&r1=610134&r2=610135&view=diff ============================================================================== --- lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original) +++ lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Tue Jan 8 12:32:29 2008 @@ -1003,6 +1003,64 @@ </section> <section> + <title>Task Execution & Environment</title> + + <p>The <code>TaskTracker</code> executes the <code>Mapper</code>/ + <code>Reducer</code> <em>task</em> as a child process in a separate jvm. + </p> + + <p>The child-task inherits the environment of the parent + <code>TaskTracker</code>. The user can specify additional options to the + child-jvm via the <code>mapred.child.java.opts</code> configuration + parameter in the <code>JobConf</code> such as non-standard paths for the + run-time linker to search shared libraries via + <code>-Djava.library.path=<></code> etc. If the + <code>mapred.child.java.opts</code> contains the symbol <em>@taskid@</em> + it is interpolated with value of <code>taskid</code> of the map/reduce + task.</p> + + <p>Here is an example with multiple arguments and substitutions, + showing jvm GC logging, and start of a passwordless JVM JMX agent so that + it can connect with jconsole and the likes to watch child memory, + threads and get thread dumps. It also sets the maximum heap-size of the + child jvm to 512MB and adds an additional path to the + <code>java.library.path</code> of the child-jvm.</p> + + <p> + <code><property></code><br/> + <code><name>mapred.child.java.opts</name></code><br/> + <code><value></code><br/> + <code> + -Xmx512M -Djava.library.path=/home/mycompany/lib + -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED]</code><br/> + <code> + -Dcom.sun.management.jmxremote.authenticate=false + -Dcom.sun.management.jmxremote.ssl=false</code><br/> + <code></value></code><br/> + <code></property></code> + </p> + + <p>The <a href="#DistributedCache">DistributedCache</a> can also be used + as a rudimentary software distribution mechanism for use in the map + and/or reduce tasks. It can be used to distribute both jars and + native libraries. The + <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addarchivetoclasspath"> + DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or + <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addfiletoclasspath"> + DistributedCache.addFileToClassPath(Path, Configuration)</a> api can + be used to cache files/jars and also add them to the <em>classpath</em> + of child-jvm. Similarly the facility provided by the + <code>DistributedCache</code> where-in it symlinks the cached files into + the working directory of the task can be used to distribute native + libraries and load them. The underlying detail is that child-jvm always + has its <em>current working directory</em> added to the + <code>java.library.path</code> and hence the cached libraries can be + loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)"> + System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)"> + System.load</a>.</p> + </section> + + <section> <title>Job Submission and Monitoring</title> <p><a href="ext:api/org/apache/hadoop/mapred/jobclient"> @@ -1260,19 +1318,20 @@ efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves.</p> + + <p><code>DistributedCache</code> tracks the modification timestamps of + the cached files. Clearly the cache files should not be modified by + the application or externally while the job is executing.</p> <p><code>DistributedCache</code> can be used to distribute simple, read-only data/text files and more complex types such as archives and jars. Archives (zip files) are <em>un-archived</em> at the slave nodes. - Jars maybe be optionally added to the classpath of the tasks, a - rudimentary <em>software distribution</em> mechanism. Files have - <em>execution permissions</em> set. Optionally users can also direct the - <code>DistributedCache</code> to <em>symlink</em> the cached file(s) - into the working directory of the task.</p> - - <p><code>DistributedCache</code> tracks the modification timestamps of - the cached files. Clearly the cache files should not be modified by - the application or externally while the job is executing.</p> + Optionally users can also direct the <code>DistributedCache</code> to + <em>symlink</em> the cached file(s) into the <code>current working + directory</code> of the task via the + <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink"> + DistributedCache.createSymlink(Path, Configuration)</a> api. Files + have <em>execution permissions</em> set.</p> </section> <section>
Modified: lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=610135&r1=610134&r2=610135&view=diff ============================================================================== --- lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml (original) +++ lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml Tue Jan 8 12:32:29 2008 @@ -61,7 +61,11 @@ </configuration> </conf> <filecache href="filecache/"> - <distributedcache href="DistributedCache.html" /> + <distributedcache href="DistributedCache.html"> + <addarchivetoclasspath href="#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" /> + <addfiletoclasspath href="#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" /> + <createsymlink href="#createSymlink(org.apache.hadoop.conf.Configuration)" /> + </distributedcache> </filecache> <fs href="fs/"> <filesystem href="FileSystem.html" /> Modified: lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskRunner.java URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskRunner.java?rev=610135&r1=610134&r2=610135&view=diff ============================================================================== --- lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskRunner.java (original) +++ lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskRunner.java Tue Jan 8 12:32:29 2008 @@ -293,19 +293,31 @@ javaOpts = replaceAll(javaOpts, "@taskid@", taskid); String [] javaOptsSplit = javaOpts.split(" "); - //Add java.library.path; necessary for native-hadoop libraries + // Add java.library.path; necessary for loading native libraries. + // + // 1. To support native-hadoop library i.e. libhadoop.so, we add the + // parent processes' java.library.path to the child. + // 2. We also add the 'cwd' of the task to it's java.library.path to help + // users distribute native libraries via the DistributedCache. + // 3. The user can also specify extra paths to be added to the + // java.library.path via mapred.child.java.opts. + // String libraryPath = System.getProperty("java.library.path"); - if (libraryPath != null) { - boolean hasLibrary = false; - for(int i=0; i<javaOptsSplit.length ;i++) { - if(javaOptsSplit[i].startsWith("-Djava.library.path=")) { - javaOptsSplit[i] += sep + libraryPath; - hasLibrary = true; - break; - } + if (libraryPath == null) { + libraryPath = workDir.getAbsolutePath(); + } else { + libraryPath += sep + workDir; + } + boolean hasUserLDPath = false; + for(int i=0; i<javaOptsSplit.length ;i++) { + if(javaOptsSplit[i].startsWith("-Djava.library.path=")) { + javaOptsSplit[i] += sep + libraryPath; + hasUserLDPath = true; + break; } - if(!hasLibrary) - vargs.add("-Djava.library.path=" + libraryPath); + } + if(!hasUserLDPath) { + vargs.add("-Djava.library.path=" + libraryPath); } for (int i = 0; i < javaOptsSplit.length; i++) { vargs.add(javaOptsSplit[i]);