I'm invoking hadoop with pipes command: hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
I tried the -file and -cacheFile options but when either of these is passed to hadoop pipes, the command just exits with a usage message. There must be a way to specify a jar for a job implemented in C++ with the hadoop Pipes API. The documentation states that record readers and writers for Pipes jobs can be implemented in java. I looked at the source code of org.apache.hadoop.mapred.pipes.Submitter and it's doing the following: /** * The main entry point and job submitter. It may either be used as * a command line-based or API-based method to launch Pipes jobs. */ public class Submitter { /** * Submit a pipes job based on the command line arguments. * @param args */ public static void main(String[] args) throws Exception { CommandLineParser cli = new CommandLineParser(); //... if (results.hasOption("-inputformat")) { setIsJavaRecordReader(conf, true); conf.setInputFormat(getClass(results, "-inputformat", conf, InputFormat.class)); } } } It is loading the input format class based on the value of the -inputformat cmdline parameter. That means there should be some way to package the input format class along with the program binary and other supporting files. -Rahul Sood [EMAIL PROTECTED] > You should use the -pipes option in the command. > For the input format, you can pack it into the hadoop core class jar file, > or put it into the cache file. > > 2008/4/8, Rahul Sood <[EMAIL PROTECTED]>: > > > > Hi, > > > > I implemented a customized input format in Java for a Map Reduce job. > > The mapper and reducer classes are implemented in C++, using the Hadoop > > Pipes API. > > > > The package documentation for org.apache.hadoop.mapred.pipes states that > > "The job may consist of any combination of Java and C++ RecordReaders, > > Mappers, Paritioner, Combiner, Reducer, and RecordWriter" > > > > I packaged the input format class in a jar file and ran the job > > invocation command: > > > > hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf > > conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr > > > > It keeps failing with error ClassNotFoundException > > Although I've specified the jar file name with the -jar parameter, the > > input format class still cannot be located. Is there any other means to > > specify the input format class, or the job jar file, for a Pipes job ? > > > > Stack trace: > > > > Exception in thread "main" java.lang.ClassNotFoundException: > > mytest.PriceInputFormat > > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:247) > > at > > > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524) > > at > > org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309) > > at > > org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357) > > > > Thanks, > > > > > > Rahul Sood > > [EMAIL PROTECTED] > > > > > >