On Fri, Sep 26, 2008 at 7:50 AM, Samuel Guo <[EMAIL PROTECTED]> wrote:
> maybe you can use
> bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args
>
> see details:
> http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html

Most of our classes are in non-jars. I suppose it wouldn't be too bad
to tell ant to jar them up, but with the hack, it's easy enough to not
bother.

-- David

>
> On Thu, Sep 25, 2008 at 12:26 PM, David Hall <[EMAIL PROTECTED]>wrote:
>
>> On Sun, Sep 21, 2008 at 9:41 PM, David Hall <[EMAIL PROTECTED]>
>> wrote:
>> > On Sun, Sep 21, 2008 at 9:35 PM, Arun C Murthy <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> On Sep 21, 2008, at 2:05 PM, David Hall wrote:
>> >>
>> >>> (New to this list)
>> >>>
>> >>> Hi,
>> >>>
>> >>> My research group is setting up a small (20-node) cluster. All of
>> >>> these machines are linked by NFS. We have a fairly entrenched
>> >>> codebase/development cycle, and in particular we'd like to be able to
>> >>> access user $CLASSPATHs in the forked jvms run by the Map and Reduce
>> >>> tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to
>> >>> disallow this by specifying it's own.
>> >>>
>> >>
>> >> Using jars on NFS for too many tasks might hurt if you have thousands of
>> >> tasks, causing too much load.
>> >>
>> >> The better solution might be to use the DistributedCache:
>> >>
>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache
>> >>
>> >> Specifically:
>> >>
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29>
>> >>
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29>
>> >>
>> >> Arun
>> >
>> > Good point.. I hadn't thought of that, but at the moment we're dealing
>> > with barrier-to-adoption rather than efficiency. We'll have to go back
>> > to PBS if we can't get users (read: picky phd students) on board. I'd
>> > rather avoid that scenario...
>> >
>> > In the meantime, I think I figured out a hack that I'm going to try.
>>
>> In case anyone's curious, the hack is to create a jar file with a
>> manifest that has the Class-Path field set to all the directories and
>> jars you want, and to put that in the lib/ folder of another jar, and
>> pass that final jar in as the User Jar to a job.
>>
>> Works like a charm. :-)
>>
>> -- David
>>
>> >
>> > Thanks!
>> >
>> > -- David
>> >
>> >>
>> >>> Is there any easy way to "trick" hadoop into making these visible? If
>> >>> not, if I were to submit a patch that would (optionally) add
>> >>> $CLASSPATH to the forked jvms' classpath, would it be considered?
>> >>>
>> >>> Thanks,
>> >>> David Hall
>> >>
>> >>
>> >
>>
>

Reply via email to