Hi,
With several days investigation, the wordcount-nopipe example in the
hadoop-0.19.1 package can be run finally. However, there are some changes I did
but not sure if this is proper/correct way. Could anyone please help to verify?
1. start up the job with the -inputformat argument with value
"org.apache.hadoop.mapred.pipes.WordCountInputFormat"
2. since the RecordReader/Writer in C++ is used, so no Java based
RecordReader/Writer should be used. However, I faced the error like
"RecordReader defined while not needed" from the pipes. After checking the
org.apache.hadoop.mapred.pipes.Submitter.java, I found this code snippet:
if (results.hasOption("-inputformat")) {
setIsJavaRecordReader(job, true);
job.setInputFormat(getClass(results, "-inputformat", job,
InputFormat.class));
}
So it seems that with -inputformat specified, the JavaRecordReader will be
enabled. This caused the error before. Then I comment the line
"setIsJavaRecordReader(job, true);". Then the examples can be run. Is this the
proper way to make the wordcount-nopipe example works? I see from the code that
there is a commented line "//cli.addArgument("javareader", false, "is the
RecordReader in Java");" . Should this line be uncommented to support the
disable of JavaRecordReader in command line option?
3. It seems that the wordcount-nopipe only works for input/output with local
URI file:///home/... Is this true?
Thanks,
Jianmin