On Mon, Jan 07, 2008 at 01:27:56PM -0800, Ted Dunning wrote:
>
>These sound right to me, but I have only personally used (4).  Also, in (4),
>you have to make sure that the jars are under /lib in the big fat jar.
>
>I can't comment on (3).  Perhaps there is a committer handy?  Olga?  Alan?
>Doug?
>

H-1622 isn't ready yet, discussions are still on...

Arun

>
>On 1/7/08 1:22 PM, "Lars George" <[EMAIL PROTECTED]> wrote:
>
>> Ted,
>> 
>> So we have these choices?
>> 
>> 1. Local copy of libs and setting HADOOP_CLASSPATH
>> 
>> 2. Using DistributedCache and upload the files "manually" into it.
>> 
>> 3. Add jars using the Job interface (JIRA 1622)
>> 
>> 4. Pack everything into one big fat job jar
>> 
>> Am I missing something?
>> 
>> Question, is the JIRA 1622 actually usable yet? I am using a about 14
>> day old nightly developers build, so that should have that in that case?
>> 
>> Which way would you go?
>> 
>> Lars
>> 
>> 
>> Ted Dunning wrote:
>>> Arun's comment about the DistributedCache is actually a very viable
>>> alternative (certainly one that I am about to investigate).
>>> 
>>> 
>>> On 1/7/08 12:06 PM, "Lars George" <[EMAIL PROTECTED]> wrote:
>>> 
>>>   
>>>> Ted,
>>>> 
>>>> Means going the HADOOP_CLASSPATH route, ie. creating a separate
>>>> directory for those shared jars and then set it once in the
>>>> hadoop-env.sh, I think this will work for me too, I am in the process of
>>>> setting a separate CONF_DIR anyways after my recent update - where I
>>>> forgot a couple of files to copy them into the new tree.
>>>> 
>>>> I was following this:
>>>> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02860.html
>>>> 
>>>> Which I could not find on the Wiki really, although the above is a
>>>> commit. Am I missing something?
>>>> 
>>>> Lars
>>>> 
>>>> 
>>>> Ted Dunning wrote:
>>>>     
>>>>> /lib is definitely the way to go.
>>>>> 
>>>>> But adding gobs and gobs of stuff there makes jobs start slowly because 
>>>>> you
>>>>> have to propagate a multi-megabyte blob to lots of worker nodes.
>>>>> 
>>>>> I would consider adding universally used jars to the hadoop class path on
>>>>> every node, but I would also expect to face configuration management
>>>>> nightmares (small ones, though) from doing this.
>>>>> 
>>>>> 
>>>>> On 1/7/08 11:50 AM, "Lars George" <[EMAIL PROTECTED]> wrote:
>>>>> 
>>>>>   
>>>>>       
>>>>>> Arun,
>>>>>> 
>>>>>> Ah yes, I see it now in JobClient. OK, then how are the required aux
>>>>>> libs handled? I assume a /lib inside the job jar is the only way to go?
>>>>>> 
>>>>>> I saw the discussion on the Wiki about adding Hbase permanently to the
>>>>>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files,
>>>>>> Xerces etc. I guess it is better if I add everything non-Hadoop into the
>>>>>> job jar's lib directory?
>>>>>> 
>>>>>> Thanks again for the help,
>>>>>> Lars
>>>>>> 
>>>>>> 
>>>>>> Arun C Murthy wrote:
>>>>>>     
>>>>>>         
>>>>>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote:
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Maybe someone here can help me with a rather noob question. Where do I
>>>>>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and
>>>>>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh?
>>>>>>>> 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>            
>>>>>>> Once you have your jar and submit it for your job via the *hadoop jar*
>>>>>>> command the framework takes care of distributing the software for nodes
>>>>>>> on
>>>>>>> which your maps/reduces are scheduled:
>>>>>>> $ hadoop jar <custom_jar> <custom_args>
>>>>>>> 
>>>>>>> The detail is that the framework copies your jar from the submission 
>>>>>>> node
>>>>>>> to
>>>>>>> the HDFS and then copies it onto the execution node.
>>>>>>> 
>>>>>>> Does 
>>>>>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage
>>>>>>> help?
>>>>>>> 
>>>>>>> Arun
>>>>>>> 
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>> Also, since I am using the Hadoop API already from our server code, it
>>>>>>>> seems natural to launch jobs from within our code. Are there any issue
>>>>>>>> with that? I assume I have to copy the jar files first and make them
>>>>>>>> available as per my question above, but then I am ready to start it 
>>>>>>>> from
>>>>>>>> my own code?
>>>>>>>> 
>>>>>>>> I have read most Wiki entries and while the actual workings are
>>>>>>>> described quite nicely, I could not find an answer to the questions
>>>>>>>> above. The demos are already in place and can be started as is without
>>>>>>>> the need of making them available.
>>>>>>>> 
>>>>>>>> Again, I apologize for being a noobie.
>>>>>>>> 
>>>>>>>> Lars
>>>>>>>>     
>>>>>>>>         
>>>>>>>>            
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>   
>>>>>       
>>> 
>>>   
>

Reply via email to