Hello Andrey Just look at the -cacheDir with streaming , if it can help you out
http://hadoop.apache.org/core/docs/current/streaming.html#Large+files +and+archives+in+Hadoop+Streaming Thankyou , --- Peeyush On Tue, 2008-03-11 at 17:30 +0200, Andrey Pankov wrote: > Hi all, > > I'm still new to Hadoop. I'd like to use Hadoop streaming in order to > combine mapper as Java class and reducer as C++ program. Currently I'm > at the beginning of this task and now I have troubles with Java class. > It looks something like > > > package org.company; > ... > public class TestMapper extends MapReduceBase implements Mapper { > ... > public void map(WritableComparable key, Writable value, > OutputCollector output, Reporter reporter) throws IOException { > ... > > > I created jar file with my class and it is accessible via $CLASSPATH. > I'm running stream job using > > $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data > -output /out1 > > Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is > > =========================== > 2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=MAP, sessionId= > 2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask: > numReduceTasks: 1 > 2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error > running child > java.lang.RuntimeException: java.lang.RuntimeException: > java.lang.ClassNotFoundException: org.company.TestMapper > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639) > at > org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071) > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > org.company.TestMapper > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631) > ... 6 more > Caused by: java.lang.ClassNotFoundException: org.company.TestMapper > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605) > ... 7 more > =========================== > > What is interesting for me. I had put into Hadoop streaming > (StreamJob.java and StreamUtil.java) some debugging println(). Streaming > can see TestMapper on job configuration stage (StreamJob.setJobConf() > routine) but cannot later. Next code creates new instance of TestMapper > and calls toString() defined in TestMapper. It works. > > if (mapCmd_ != null) { > c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage); > if (c != null) { > System.out.println("#######################"); > try { > System.out.println(c.newInstance().toString()); > } catch (Exception e) { } > System.out.println("#######################"); > jobConf_.setMapperClass(c); > } else { > ... > } > } > > > I tried to add jar file with TestMapper using option > "-file test_mapper.jar" . The result is the same. > > Could anybody advice me something? Thanks in advance, > > --- > Andrey Pankov. >
