On Sep 21, 2008, at 2:05 PM, David Hall wrote:

(New to this list)

Hi,

My research group is setting up a small (20-node) cluster. All of
these machines are linked by NFS. We have a fairly entrenched
codebase/development cycle, and in particular we'd like to be able to
access user $CLASSPATHs in the forked jvms run by the Map and Reduce
tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to
disallow this by specifying it's own.


Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load.

The better solution might be to use the DistributedCache:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache

Specifically:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)

Arun

Is there any easy way to "trick" hadoop into making these visible? If
not, if I were to submit a patch that would (optionally) add
$CLASSPATH to the forked jvms' classpath, would it be considered?

Thanks,
David Hall

Reply via email to