Re: Including Additional Jars

Bill Graham Mon, 04 Apr 2011 14:00:00 -0700

Shuja, I haven't tried this, but from what I've read it seems you
could just add all your jars required by the Mapper and Reducer to
HDFS and then add them to the classpath in your run() method like
this:


DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

I think that's all there is to it, but like I said, I haven't tried
it. Just be sure your run() method isn't in the same class as your
mapper/reducer if they import packages from any of the distributed
cache jars.


On Mon, Apr 4, 2011 at 11:40 AM, James Seigel <[email protected]> wrote:
> James’ quick and dirty, get your job running guideline:
>
> -libjars <-- for jars you want accessible by the mappers and reducers
> classpath or bundled in the main jar <-- for jars you want accessible to the 
> runner
>
> Cheers
> James.
>
>
>
> On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:
>
>> well...i think to put in distributed cache is good idea. do u have any
>> working example how to put extra jars in distributed cache and how to make
>> available these jars for job?
>> Thanks
>>
>> On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <[email protected]> wrote:
>>
>>> I think you can put them either in your jar or in distributed cache.
>>>
>>> As Allen pointed out, my idea of putting them into hadoop lib jar was
>>> wrong.
>>>
>>> Mark
>>>
>>> On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <[email protected]
>>>> wrote:
>>>
>>>> On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
>>>>
>>>>>
>>>>> On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
>>>>>
>>>>> Hi All
>>>>>>
>>>>>> I have created a map reduce job and to run on it on the cluster, i have
>>>>>> bundled all jars(hadoop, hbase etc) into single jar which increases the
>>>>>> size
>>>>>> of overall file. During the development process, i need to copy again
>>> and
>>>>>> again this complete file which is very time consuming so is there any
>>> way
>>>>>> that i just copy the program jar only and do not need to copy the lib
>>>>>> files
>>>>>> again and again. i am using net beans to develop the program.
>>>>>>
>>>>>> kindly let me know how to solve this issue?
>>>>>>
>>>>>
>>>>>       This was in the FAQ, but in a non-obvious place.  I've updated it
>>>>> to be more visible (hopefully):
>>>>>
>>>>>
>>>>>
>>> http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
>>>>>
>>>>
>>>> Does the same apply to jar containing libraries? Let's suppose I need
>>>> lucene-core.jar to run my project. Can I put my this jar into my job jar
>>> and
>>>> have hadoop "see" lucene's classes? Or should I use distributed cache??
>>>>
>>>> MD
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>
>

Re: Including Additional Jars

Reply via email to