That certainly works, though if you plan to upgrade the underlying library, you'll find that copying files with the correct versions into $HADOOP_HOME/lib rapidly gets tedious, and subtle mistakes (e.g., forgetting one machine) can lead to frustration.
When you consider the fact that you're using a Hadoop cluster to process and transfer around GBs of data on the low end, the difference between a 10 MB and a 20 MB job jar starts to look meaningless. Putting other jars in a lib/ directory inside your job jar keeps the version consistent and doesn't clutter up a shared directory on your cluster (assuming there are other users). - Aaron On Tue, Apr 14, 2009 at 11:15 AM, Farhan Husain <[email protected]> wrote: > Hello, > > I got another solution for this. I just pasted all the required jar files > in > lib folder of each hadoop node. In this way the job jar is not too big and > will require less time to distribute in the cluster. > > Thanks, > Farhan > > On Mon, Apr 13, 2009 at 7:22 PM, Nick Cen <[email protected]> wrote: > > > create a directroy call 'lib' in your project's root dir, then put all > the > > 3rd party jar in it. > > > > 2009/4/14 Farhan Husain <[email protected]> > > > > > Hello, > > > > > > I am trying to use Pellet library for some OWL inferencing in my mapper > > > class. But I can't find a way to bundle the library jar files in my job > > jar > > > file. I am exporting my project as a jar file from Eclipse IDE. Will it > > > work > > > if I create the jar manually and include all the jar files Pellet > library > > > has? Is there any simpler way to include 3rd party library jar files in > a > > > hadoop job jar? Without being able to include the library jars I am > > getting > > > ClassNotFoundException. > > > > > > Thanks, > > > Farhan > > > > > > > > > > > -- > > http://daily.appspot.com/food/ > > >
