I agree. It will eventually get us in trouble. That's why we want to get the -libjars option to work, but it's not working.. arrrghhh.. It's the simplest things in engineering that take the longest time... -:)
Can you see why this may not work? /Users/xyz/hadoop-0.20.2/bin/hadoop jar /Users/xyz/modules/something/target/my.jar com.xyz.common.MyMapReduce -libjars /Users/xyz/modules/something/target/my.jar, /Users/xyz/avro-tools-1.5.4.jar On Wed, Nov 16, 2011 at 8:51 AM, Friso van Vollenhoven < fvanvollenho...@xebia.com> wrote: > You use maven jar-with-deps default assembly? That layout works too, but > it will give you problems eventually when you have different classes with > the same package and name. > > Java jar files are regular ZIP files. They can contain duplicate > entries. I don't know whether your packaging creates duplicates in them, > but if it does, it could be the cause of your problem. > > Try checking your jar for a duplicate license dir in the META-INF > (something like: unzip -l <your-jar-name>.jar | awk '{print $4}' | sort | > uniq -d) > > > Friso > > > On 16 nov. 2011, at 17:33, Something Something wrote: > > Thanks Bejoy & Friso. When I use the all-in-one jar file created by Maven > I get this: > > Mkdirs failed to create > /Users/xyz/hdfs/hadoop-unjar4743660161930001886/META-INF/license > > > Do you recall coming across this? Our 'all-in-one' jar is not exactly how > you have described it. It doesn't contain any JARs, but it has all the > classes from all the dependent JARs. > > > On Wed, Nov 16, 2011 at 7:59 AM, Friso van Vollenhoven < > fvanvollenho...@xebia.com> wrote: > >> We usually package my jobs as a single jar that contains a /lib directory >> in the jar that contains all other jars that the job code depends on. >> Hadoop understands this layout when run as 'hadoop jar'. So the jar layout >> would be something like: >> >> /META-INF/manifest.mf >> /com/mypackage/MyMapperClass.class >> /com/mypackage/MyReducerClass.class >> /lib/dependency1.jar >> /lib/dependency2.jar >> etc. >> >> If you use Maven or some other build tool with dependency management, >> you can usually produce this jar as part of your build. We also have Maven >> write the main class to the manifest, such that there is no need to type >> it. So for us, submitting a job looks like: >> hadoop jar jar-with-all-deps-in-lib.jar arg1 arg2 argN >> >> Then Hadoop will take care of submitting and distributing, etc. Of >> course you pay the penalty of always sending all of your dependencies over >> the wire (the job jar gets replicated to 10 machines by >> default). Pre-distributing sounds tedious and error prone to me. What if >> you have different jobs that require different versions of the same >> dependency? >> >> >> HTH, >> Friso >> >> >> >> >> >> On 16 nov. 2011, at 15:42, Something Something wrote: >> >> Bejoy - Thanks for the reply. The '-libjars' is not working for me with >> 'hadoop jar'. Also, as per the documentation ( >> http://hadoop.apache.org/common/docs/current/commands_manual.html#jar): >> >> Generic Options >> >> The following options are supported by >> dfsadmin<http://hadoop.apache.org/common/docs/current/commands_manual.html#dfsadmin> >> , fs<http://hadoop.apache.org/common/docs/current/commands_manual.html#fs> >> , >> fsck<http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck> >> , job<http://hadoop.apache.org/common/docs/current/commands_manual.html#job> >> and >> fetchdt<http://hadoop.apache.org/common/docs/current/commands_manual.html#fetchdt> >> . >> >> >> >> Does it work for you? If it does, please let me know. >> "Pre-distributing" definitely works, but is that the best way? If you >> have a big cluster and Jars are changing often it will be time-consuming. >> >> Also, how does Pig do it? We update Pig UDFs often and put them only on >> the 'client' machine (machine that starts the Pig job) and the UDF becomes >> available to all machines in the cluster - automagically! Is Pig doing the >> pre-distributing for us? >> >> Thanks for your patience & help with our questions. >> >> On Wed, Nov 16, 2011 at 6:29 AM, Something Something < >> mailinglist...@gmail.com> wrote: >> >>> Hmm... there must be a different way 'cause we don't need to do that to >>> run Pig jobs. >>> >>> >>> On Tue, Nov 15, 2011 at 10:58 PM, Daan Gerits <daan.ger...@gmail.com>wrote: >>> >>>> There might be different ways but currently we are storing our jars >>>> onto HDFS and register them from there. They will be copied to the machine >>>> once the job starts. Is that an option? >>>> >>>> Daan. >>>> >>>> On 16 Nov 2011, at 07:24, Something Something wrote: >>>> >>>> > Until now we were manually copying our Jars to all machines in a >>>> Hadoop >>>> > cluster. This used to work until our cluster size was small. Now our >>>> > cluster is getting bigger. What's the best way to start a Hadoop Job >>>> that >>>> > automatically distributes the Jar to all machines in a cluster? >>>> > >>>> > I read the doc at: >>>> > http://hadoop.apache.org/common/docs/current/commands_manual.html#jar >>>> > >>>> > Would -libjars do the trick? But we need to use 'hadoop job' for >>>> that, >>>> > right? Until now, we were using 'hadoop jar' to start all our jobs. >>>> > >>>> > Needless to say, we are getting our feet wet with Hadoop, so >>>> appreciate >>>> > your help with our dumb questions. >>>> > >>>> > Thanks. >>>> > >>>> > PS: We use Pig a lot, which automatically does this, so there must >>>> be a >>>> > clean way to do this. >>>> >>>> >>> >> >> > >