After working on it for a while, I realized that it's a mistake. It actually works well. But there are two points needed make sure, both CLASSPATH and -libjars.
For CLASSPATH, we need to make sure depended jars should be included in while launching mapreduce task, while -libjars is also important to contain all necessary jars in mapper and reducer. Victor On Wed, Jan 20, 2010 at 2:35 PM, Victor Hsieh <[email protected]> wrote: > Yes, it can be done by doing so, and then restart the cluster (since > tasktrackers need to know new jars was added). But I'm maintaining a > cluster for different users, thus looking for a solution without restart. > Thanks, > Victor > > On Wed, Jan 20, 2010 at 2:17 PM, Rekha Joshi <[email protected]> wrote: >> >> Not sure what error you get and if it is suggestive, but attimes where you >> place the libjars option can make a difference.You can try adding the jar to >> your HADOOP_CLASSPATH and then executing? >> >> Cheers, >> /R >> >> >> On 1/20/10 9:50 AM, "Victor Hsieh" <[email protected]> wrote: >> >> Hi, >> >> I was trying to run a mapreduce job with some jars but failed. It seems >> that jars specified in command line -libjars was not shipped to mapreduce >> worker together. >> >> After digging into the code, I found that deprecated API and current are >> different from -libjars behavior (also -files and -archives). In >> deprecated >> API, JobClient.runJob() will copy -libjars to DistributedCache (more >> precisely, GenericOptionParser parses the -libjars, saving as "tmpjars" in >> configuration, then JobClient upload tmpjars). However, in current API, I >> didn't see anything related (by grepping tmpjar or something in >> hadoop-0.20.1/src/). >> >> Is there any helper function or something in current API? Or I need to do >> it myself like what JobClient do? >> >> Help appreciated. >> >> Victor >> > >
