hadoop streaming using java as mapper & reducer

HU Wenjing A Sun, 27 May 2012 23:51:07 -0700

Hi all,

I am a new learner of hadoop, and recently I want to use hadoop streaming to 
run java program as mapper and reducer.
(because I want to use hadoop streaming to transplant some existing java 
programs to process xml file).
  To have a try, first I use the hadoop wordcount example (as follows):


     Countm.java:
     import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class countm extends Mapper<LongWritable, Text, Text, IntWritable> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
                }
             }
 }


    Countm.java:
 import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

 public class countr extends Reducer<Text, IntWritable, Text, IntWritable> {
         public void reduce(Text key, Iterable<IntWritable> values, Context 
context)
           throws IOException, InterruptedException {
               int sum = 0;
               for (IntWritable val : values) {
                  sum += val.get();
               }
             context.write(key, new IntWritable(sum));
           }
        }

And I execute the following command :
bin/hadoop jar mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -files 
test/countr.class -files test/countm.class -mapper test/countm.class  -reducer 
test/countr.class  -input input -output output
then I got the following information:
12/05/27 16:48:29 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/05/27 16:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser is 
deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/05/27 16:48:30 WARN conf.Configuration: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
packageJobJar: [/root/hadoop-0.21.0/tmp/hadoop-unjar7827631350430711602/] [] 
/tmp/streamjob2931278447922230194.jar tmpDir=null
12/05/27 16:48:31 INFO mapred.FileInputFormat: Total input paths to process : 1
12/05/27 16:48:31 WARN conf.Configuration: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: number of splits:19
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: adding the following namenodes' 
delegation tokens:null
12/05/27 16:48:31 INFO streaming.StreamJob: getLocalDirs(): 
[/root/hadoop-0.21.0/tmp/mapred/local]
12/05/27 16:48:31 INFO streaming.StreamJob: Running job: job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:48:31 INFO streaming.StreamJob: 
/root/hadoop-0.21.0/bin/../bin/hadoop job  
-Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: Tracking URL: 
http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:48:32 INFO streaming.StreamJob:  map 0%  reduce 0%
12/05/27 16:49:59 INFO streaming.StreamJob:  map 100%  reduce 100%
12/05/27 16:49:59 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:49:59 INFO streaming.StreamJob: 
/root/hadoop-0.21.0/bin/../bin/hadoop job  
-Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:49:59 INFO streaming.StreamJob: Tracking URL: 
http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:49:59 ERROR streaming.StreamJob: Job not Successful!
12/05/27 16:49:59 INFO streaming.StreamJob: killJob...
Streaming Command Failed!


I am not sure whether I use it in the wrong way or there is something special 
need to be done for using java in hadoop streaming?
It would be great if you can help me with it as I cannot move any further 
without overcoming this.
Thanking you in anticipation.  : )

Thanks & best regards,
wenjing

hadoop streaming using java as mapper & reducer

Reply via email to