Hi Amareshwari,
I have applied that patch and run my job successfully. I had to specify
jar file with '-file' option, even if it is available via $CLASSPATH:
$HSTREAMING -mapper org.company.TestMapper -reducer "cat" -input /data
-output /out4 -file /path/to/test_mapper.jar
Thanks a lot!
Amareshwari Sriramadasu wrote:
Hi Andrey,
I think that is classpath problem.
Can you try using patch at
https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have
the problem?
Thanks
Amareshwari.
Andrey Pankov wrote:
Hi all,
I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
combine mapper as Java class and reducer as C++ program. Currently I'm
at the beginning of this task and now I have troubles with Java class.
It looks something like
package org.company;
...
public class TestMapper extends MapReduceBase implements Mapper {
...
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
...
I created jar file with my class and it is accessible via $CLASSPATH.
I'm running stream job using
$HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input
/data -output /out1
Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The
error is
===========================
2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 1
2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
at
org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
at
org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
Caused by: java.lang.RuntimeException:
java.lang.ClassNotFoundException: org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
... 7 more
===========================
What is interesting for me. I had put into Hadoop streaming
(StreamJob.java and StreamUtil.java) some debugging println().
Streaming can see TestMapper on job configuration stage
(StreamJob.setJobConf() routine) but cannot later. Next code creates
new instance of TestMapper and calls toString() defined in TestMapper.
It works.
if (mapCmd_ != null) {
c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
if (c != null) {
System.out.println("#######################");
try {
System.out.println(c.newInstance().toString());
} catch (Exception e) { }
System.out.println("#######################");
jobConf_.setMapperClass(c);
} else {
...
}
}
I tried to add jar file with TestMapper using option
"-file test_mapper.jar" . The result is the same.
Could anybody advice me something? Thanks in advance,
---
Andrey Pankov.
---
Andrey Pankov.