Hi all,
I am a new learner of hadoop, and recently I want to use hadoop streaming to
run java program as mapper and reducer.
(because I want to use hadoop streaming to transplant some existing java
programs to process xml file).
To have a try, first I use the hadoop wordcount example (as follows):
Countm.java:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class countm extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
Countm.java:
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
public class countr extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context
context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
And I execute the following command :
bin/hadoop jar mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -files
test/countr.class -files test/countm.class -mapper test/countm.class -reducer
test/countr.class -input input -output output
then I got the following information:
12/05/27 16:48:29 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/05/27 16:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser is
deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/05/27 16:48:30 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
packageJobJar: [/root/hadoop-0.21.0/tmp/hadoop-unjar7827631350430711602/] []
/tmp/streamjob2931278447922230194.jar tmpDir=null
12/05/27 16:48:31 INFO mapred.FileInputFormat: Total input paths to process : 1
12/05/27 16:48:31 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: number of splits:19
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: adding the following namenodes'
delegation tokens:null
12/05/27 16:48:31 INFO streaming.StreamJob: getLocalDirs():
[/root/hadoop-0.21.0/tmp/mapred/local]
12/05/27 16:48:31 INFO streaming.StreamJob: Running job: job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:48:31 INFO streaming.StreamJob:
/root/hadoop-0.21.0/bin/../bin/hadoop job
-Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: Tracking URL:
http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:48:32 INFO streaming.StreamJob: map 0% reduce 0%
12/05/27 16:49:59 INFO streaming.StreamJob: map 100% reduce 100%
12/05/27 16:49:59 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:49:59 INFO streaming.StreamJob:
/root/hadoop-0.21.0/bin/../bin/hadoop job
-Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:49:59 INFO streaming.StreamJob: Tracking URL:
http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:49:59 ERROR streaming.StreamJob: Job not Successful!
12/05/27 16:49:59 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
I am not sure whether I use it in the wrong way or there is something special
need to be done for using java in hadoop streaming?
It would be great if you can help me with it as I cannot move any further
without overcoming this.
Thanking you in anticipation. : )
Thanks & best regards,
wenjing