Arun's comment about the DistributedCache is actually a very viable
alternative (certainly one that I am about to investigate).


On 1/7/08 12:06 PM, "Lars George" <[EMAIL PROTECTED]> wrote:

> Ted,
> 
> Means going the HADOOP_CLASSPATH route, ie. creating a separate
> directory for those shared jars and then set it once in the
> hadoop-env.sh, I think this will work for me too, I am in the process of
> setting a separate CONF_DIR anyways after my recent update - where I
> forgot a couple of files to copy them into the new tree.
> 
> I was following this:
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02860.html
> 
> Which I could not find on the Wiki really, although the above is a
> commit. Am I missing something?
> 
> Lars
> 
> 
> Ted Dunning wrote:
>> /lib is definitely the way to go.
>> 
>> But adding gobs and gobs of stuff there makes jobs start slowly because you
>> have to propagate a multi-megabyte blob to lots of worker nodes.
>> 
>> I would consider adding universally used jars to the hadoop class path on
>> every node, but I would also expect to face configuration management
>> nightmares (small ones, though) from doing this.
>> 
>> 
>> On 1/7/08 11:50 AM, "Lars George" <[EMAIL PROTECTED]> wrote:
>> 
>>   
>>> Arun,
>>> 
>>> Ah yes, I see it now in JobClient. OK, then how are the required aux
>>> libs handled? I assume a /lib inside the job jar is the only way to go?
>>> 
>>> I saw the discussion on the Wiki about adding Hbase permanently to the
>>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files,
>>> Xerces etc. I guess it is better if I add everything non-Hadoop into the
>>> job jar's lib directory?
>>> 
>>> Thanks again for the help,
>>> Lars
>>> 
>>> 
>>> Arun C Murthy wrote:
>>>     
>>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote:
>>>>   
>>>>       
>>>>> Hi,
>>>>> 
>>>>> Maybe someone here can help me with a rather noob question. Where do I
>>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and
>>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh?
>>>>> 
>>>>>     
>>>>>         
>>>> Once you have your jar and submit it for your job via the *hadoop jar*
>>>> command the framework takes care of distributing the software for nodes on
>>>> which your maps/reduces are scheduled:
>>>> $ hadoop jar <custom_jar> <custom_args>
>>>> 
>>>> The detail is that the framework copies your jar from the submission node
>>>> to
>>>> the HDFS and then copies it onto the execution node.
>>>> 
>>>> Does 
>>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage
>>>> help?
>>>> 
>>>> Arun
>>>> 
>>>>   
>>>>       
>>>>> Also, since I am using the Hadoop API already from our server code, it
>>>>> seems natural to launch jobs from within our code. Are there any issue
>>>>> with that? I assume I have to copy the jar files first and make them
>>>>> available as per my question above, but then I am ready to start it from
>>>>> my own code?
>>>>> 
>>>>> I have read most Wiki entries and while the actual workings are
>>>>> described quite nicely, I could not find an answer to the questions
>>>>> above. The demos are already in place and can be started as is without
>>>>> the need of making them available.
>>>>> 
>>>>> Again, I apologize for being a noobie.
>>>>> 
>>>>> Lars
>>>>>     
>>>>>         
>>>>   
>>>>       
>> 
>>   

Reply via email to