Arun's comment about the DistributedCache is actually a very viable alternative (certainly one that I am about to investigate).
On 1/7/08 12:06 PM, "Lars George" <[EMAIL PROTECTED]> wrote: > Ted, > > Means going the HADOOP_CLASSPATH route, ie. creating a separate > directory for those shared jars and then set it once in the > hadoop-env.sh, I think this will work for me too, I am in the process of > setting a separate CONF_DIR anyways after my recent update - where I > forgot a couple of files to copy them into the new tree. > > I was following this: > http://www.mail-archive.com/[EMAIL PROTECTED]/msg02860.html > > Which I could not find on the Wiki really, although the above is a > commit. Am I missing something? > > Lars > > > Ted Dunning wrote: >> /lib is definitely the way to go. >> >> But adding gobs and gobs of stuff there makes jobs start slowly because you >> have to propagate a multi-megabyte blob to lots of worker nodes. >> >> I would consider adding universally used jars to the hadoop class path on >> every node, but I would also expect to face configuration management >> nightmares (small ones, though) from doing this. >> >> >> On 1/7/08 11:50 AM, "Lars George" <[EMAIL PROTECTED]> wrote: >> >> >>> Arun, >>> >>> Ah yes, I see it now in JobClient. OK, then how are the required aux >>> libs handled? I assume a /lib inside the job jar is the only way to go? >>> >>> I saw the discussion on the Wiki about adding Hbase permanently to the >>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files, >>> Xerces etc. I guess it is better if I add everything non-Hadoop into the >>> job jar's lib directory? >>> >>> Thanks again for the help, >>> Lars >>> >>> >>> Arun C Murthy wrote: >>> >>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Maybe someone here can help me with a rather noob question. Where do I >>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and >>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh? >>>>> >>>>> >>>>> >>>> Once you have your jar and submit it for your job via the *hadoop jar* >>>> command the framework takes care of distributing the software for nodes on >>>> which your maps/reduces are scheduled: >>>> $ hadoop jar <custom_jar> <custom_args> >>>> >>>> The detail is that the framework copies your jar from the submission node >>>> to >>>> the HDFS and then copies it onto the execution node. >>>> >>>> Does >>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage >>>> help? >>>> >>>> Arun >>>> >>>> >>>> >>>>> Also, since I am using the Hadoop API already from our server code, it >>>>> seems natural to launch jobs from within our code. Are there any issue >>>>> with that? I assume I have to copy the jar files first and make them >>>>> available as per my question above, but then I am ready to start it from >>>>> my own code? >>>>> >>>>> I have read most Wiki entries and while the actual workings are >>>>> described quite nicely, I could not find an answer to the questions >>>>> above. The demos are already in place and can be started as is without >>>>> the need of making them available. >>>>> >>>>> Again, I apologize for being a noobie. >>>>> >>>>> Lars >>>>> >>>>> >>>> >>>> >> >>