On Mon, Jan 07, 2008 at 01:27:56PM -0800, Ted Dunning wrote: > >These sound right to me, but I have only personally used (4). Also, in (4), >you have to make sure that the jars are under /lib in the big fat jar. > >I can't comment on (3). Perhaps there is a committer handy? Olga? Alan? >Doug? >
H-1622 isn't ready yet, discussions are still on... Arun > >On 1/7/08 1:22 PM, "Lars George" <[EMAIL PROTECTED]> wrote: > >> Ted, >> >> So we have these choices? >> >> 1. Local copy of libs and setting HADOOP_CLASSPATH >> >> 2. Using DistributedCache and upload the files "manually" into it. >> >> 3. Add jars using the Job interface (JIRA 1622) >> >> 4. Pack everything into one big fat job jar >> >> Am I missing something? >> >> Question, is the JIRA 1622 actually usable yet? I am using a about 14 >> day old nightly developers build, so that should have that in that case? >> >> Which way would you go? >> >> Lars >> >> >> Ted Dunning wrote: >>> Arun's comment about the DistributedCache is actually a very viable >>> alternative (certainly one that I am about to investigate). >>> >>> >>> On 1/7/08 12:06 PM, "Lars George" <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Ted, >>>> >>>> Means going the HADOOP_CLASSPATH route, ie. creating a separate >>>> directory for those shared jars and then set it once in the >>>> hadoop-env.sh, I think this will work for me too, I am in the process of >>>> setting a separate CONF_DIR anyways after my recent update - where I >>>> forgot a couple of files to copy them into the new tree. >>>> >>>> I was following this: >>>> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02860.html >>>> >>>> Which I could not find on the Wiki really, although the above is a >>>> commit. Am I missing something? >>>> >>>> Lars >>>> >>>> >>>> Ted Dunning wrote: >>>> >>>>> /lib is definitely the way to go. >>>>> >>>>> But adding gobs and gobs of stuff there makes jobs start slowly because >>>>> you >>>>> have to propagate a multi-megabyte blob to lots of worker nodes. >>>>> >>>>> I would consider adding universally used jars to the hadoop class path on >>>>> every node, but I would also expect to face configuration management >>>>> nightmares (small ones, though) from doing this. >>>>> >>>>> >>>>> On 1/7/08 11:50 AM, "Lars George" <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> Arun, >>>>>> >>>>>> Ah yes, I see it now in JobClient. OK, then how are the required aux >>>>>> libs handled? I assume a /lib inside the job jar is the only way to go? >>>>>> >>>>>> I saw the discussion on the Wiki about adding Hbase permanently to the >>>>>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files, >>>>>> Xerces etc. I guess it is better if I add everything non-Hadoop into the >>>>>> job jar's lib directory? >>>>>> >>>>>> Thanks again for the help, >>>>>> Lars >>>>>> >>>>>> >>>>>> Arun C Murthy wrote: >>>>>> >>>>>> >>>>>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Maybe someone here can help me with a rather noob question. Where do I >>>>>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and >>>>>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Once you have your jar and submit it for your job via the *hadoop jar* >>>>>>> command the framework takes care of distributing the software for nodes >>>>>>> on >>>>>>> which your maps/reduces are scheduled: >>>>>>> $ hadoop jar <custom_jar> <custom_args> >>>>>>> >>>>>>> The detail is that the framework copies your jar from the submission >>>>>>> node >>>>>>> to >>>>>>> the HDFS and then copies it onto the execution node. >>>>>>> >>>>>>> Does >>>>>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage >>>>>>> help? >>>>>>> >>>>>>> Arun >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Also, since I am using the Hadoop API already from our server code, it >>>>>>>> seems natural to launch jobs from within our code. Are there any issue >>>>>>>> with that? I assume I have to copy the jar files first and make them >>>>>>>> available as per my question above, but then I am ready to start it >>>>>>>> from >>>>>>>> my own code? >>>>>>>> >>>>>>>> I have read most Wiki entries and while the actual workings are >>>>>>>> described quite nicely, I could not find an answer to the questions >>>>>>>> above. The demos are already in place and can be started as is without >>>>>>>> the need of making them available. >>>>>>>> >>>>>>>> Again, I apologize for being a noobie. >>>>>>>> >>>>>>>> Lars >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> >