run the pipes wordcount example with nopipe

Jianmin Woo Wed, 08 Apr 2009 08:01:14 -0700

Hi, 

With several days investigation, the wordcount-nopipe example in the 
hadoop-0.19.1 package can be run finally. However, there are some changes I did 
but not sure if this is proper/correct way. Could anyone please help to verify?


1. start up the job with the -inputformat argument with value 
"org.apache.hadoop.mapred.pipes.WordCountInputFormat"
2. since the RecordReader/Writer in C++ is used, so no Java based 
RecordReader/Writer should be used. However, I faced the error like 
"RecordReader defined while not needed" from the pipes. After checking the 
org.apache.hadoop.mapred.pipes.Submitter.java, I found this code snippet: 

if (results.hasOption("-inputformat")) {
        setIsJavaRecordReader(job, true);
        job.setInputFormat(getClass(results, "-inputformat", job,
                                     InputFormat.class));
}

So it seems that with -inputformat specified, the JavaRecordReader will be 
enabled. This caused the error before. Then I comment the line 
"setIsJavaRecordReader(job, true);". Then the examples can be run. Is this the 
proper way to make the wordcount-nopipe example works? I see from the code that 
there is a commented line "//cli.addArgument("javareader", false, "is the 
RecordReader in Java");" . Should this line be uncommented to support the 
disable of JavaRecordReader in command line option?

3. It seems that the wordcount-nopipe only works for input/output with local 
URI  file:///home/...  Is this true?

Thanks,
Jianmin

run the pipes wordcount example with nopipe

Reply via email to