I'm invoking hadoop with pipes command:

hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr

I tried the -file and -cacheFile options but when either of these is
passed to hadoop pipes, the command just exits with a usage message.

There must be a way to specify a jar for a job implemented in C++ with
the hadoop Pipes API. The documentation states that record readers and
writers for Pipes jobs can be implemented in java. I looked at the
source code of org.apache.hadoop.mapred.pipes.Submitter and it's doing
the following:

/**
 * The main entry point and job submitter. It may either be used as
 * a command line-based or API-based method to launch Pipes jobs.
 */
public class Submitter {

   /**
   * Submit a pipes job based on the command line arguments.
   * @param args
   */
  public static void main(String[] args) throws Exception {
    CommandLineParser cli = new CommandLineParser();
    //...
      if (results.hasOption("-inputformat")) {
        setIsJavaRecordReader(conf, true);
        conf.setInputFormat(getClass(results, "-inputformat", conf,
                                     InputFormat.class));
      }
  }
}

 It is loading the input format class based on the value of the
-inputformat cmdline parameter. That means there should be some way to
package the input format class along with the program binary and other
supporting files.

-Rahul Sood
[EMAIL PROTECTED]

> You should use the -pipes option in the command.
> For the input format, you can pack it into the hadoop core class jar file,
> or put it into the cache file.
> 
> 2008/4/8, Rahul Sood <[EMAIL PROTECTED]>:
> >
> > Hi,
> >
> > I implemented a customized input format in Java for a Map Reduce job.
> > The mapper and reducer classes are implemented in C++, using the Hadoop
> > Pipes API.
> >
> > The package documentation for org.apache.hadoop.mapred.pipes states that
> > "The job may consist of any combination of Java and C++ RecordReaders,
> > Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
> >
> > I packaged the input format class in a jar file and ran the job
> > invocation command:
> >
> > hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
> > conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
> >
> > It keeps failing with error ClassNotFoundException
> > Although I've specified the jar file name with the -jar parameter, the
> > input format class still cannot be located. Is there any other means to
> > specify the input format class, or the job jar file, for a Pipes job ?
> >
> > Stack trace:
> >
> > Exception in thread "main" java.lang.ClassNotFoundException:
> > mytest.PriceInputFormat
> >         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
> >         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> >         at java.lang.Class.forName0(Native Method)
> >         at java.lang.Class.forName(Class.java:247)
> >         at
> >
> > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
> >         at
> > org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
> >         at
> > org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
> >
> > Thanks,
> >
> >
> > Rahul Sood
> > [EMAIL PROTECTED]
> >
> >
> >

Reply via email to