On Sep 21, 2008, at 2:05 PM, David Hall wrote:
(New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own.
Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load.
The better solution might be to use the DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Specifically: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration) http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration) Arun
Is there any easy way to "trick" hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall
