These sound right to me, but I have only personally used (4). Also, in (4), you have to make sure that the jars are under /lib in the big fat jar.
I can't comment on (3). Perhaps there is a committer handy? Olga? Alan? Doug? On 1/7/08 1:22 PM, "Lars George" <[EMAIL PROTECTED]> wrote: > Ted, > > So we have these choices? > > 1. Local copy of libs and setting HADOOP_CLASSPATH > > 2. Using DistributedCache and upload the files "manually" into it. > > 3. Add jars using the Job interface (JIRA 1622) > > 4. Pack everything into one big fat job jar > > Am I missing something? > > Question, is the JIRA 1622 actually usable yet? I am using a about 14 > day old nightly developers build, so that should have that in that case? > > Which way would you go? > > Lars > > > Ted Dunning wrote: >> Arun's comment about the DistributedCache is actually a very viable >> alternative (certainly one that I am about to investigate). >> >> >> On 1/7/08 12:06 PM, "Lars George" <[EMAIL PROTECTED]> wrote: >> >> >>> Ted, >>> >>> Means going the HADOOP_CLASSPATH route, ie. creating a separate >>> directory for those shared jars and then set it once in the >>> hadoop-env.sh, I think this will work for me too, I am in the process of >>> setting a separate CONF_DIR anyways after my recent update - where I >>> forgot a couple of files to copy them into the new tree. >>> >>> I was following this: >>> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02860.html >>> >>> Which I could not find on the Wiki really, although the above is a >>> commit. Am I missing something? >>> >>> Lars >>> >>> >>> Ted Dunning wrote: >>> >>>> /lib is definitely the way to go. >>>> >>>> But adding gobs and gobs of stuff there makes jobs start slowly because you >>>> have to propagate a multi-megabyte blob to lots of worker nodes. >>>> >>>> I would consider adding universally used jars to the hadoop class path on >>>> every node, but I would also expect to face configuration management >>>> nightmares (small ones, though) from doing this. >>>> >>>> >>>> On 1/7/08 11:50 AM, "Lars George" <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> >>>>> Arun, >>>>> >>>>> Ah yes, I see it now in JobClient. OK, then how are the required aux >>>>> libs handled? I assume a /lib inside the job jar is the only way to go? >>>>> >>>>> I saw the discussion on the Wiki about adding Hbase permanently to the >>>>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files, >>>>> Xerces etc. I guess it is better if I add everything non-Hadoop into the >>>>> job jar's lib directory? >>>>> >>>>> Thanks again for the help, >>>>> Lars >>>>> >>>>> >>>>> Arun C Murthy wrote: >>>>> >>>>> >>>>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Maybe someone here can help me with a rather noob question. Where do I >>>>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and >>>>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Once you have your jar and submit it for your job via the *hadoop jar* >>>>>> command the framework takes care of distributing the software for nodes >>>>>> on >>>>>> which your maps/reduces are scheduled: >>>>>> $ hadoop jar <custom_jar> <custom_args> >>>>>> >>>>>> The detail is that the framework copies your jar from the submission node >>>>>> to >>>>>> the HDFS and then copies it onto the execution node. >>>>>> >>>>>> Does >>>>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage >>>>>> help? >>>>>> >>>>>> Arun >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Also, since I am using the Hadoop API already from our server code, it >>>>>>> seems natural to launch jobs from within our code. Are there any issue >>>>>>> with that? I assume I have to copy the jar files first and make them >>>>>>> available as per my question above, but then I am ready to start it from >>>>>>> my own code? >>>>>>> >>>>>>> I have read most Wiki entries and while the actual workings are >>>>>>> described quite nicely, I could not find an answer to the questions >>>>>>> above. The demos are already in place and can be started as is without >>>>>>> the need of making them available. >>>>>>> >>>>>>> Again, I apologize for being a noobie. >>>>>>> >>>>>>> Lars >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >>