InputFormat's method RecordReader<K, V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException; return a RecordReader. You can implement your own InputFormat and RecordReader: 1)the RecorderReader remember the FileSplit(subclass of InputSplit) field in its class 2) RecordReader's createValue() method always return the FileSplit's file field.
hope this helps. heyongqiang 2008-07-09 发件人: Sandy 发送时间: 2008-07-09 01:45:15 收件人: [email protected] 抄送: 主题: modified word count example Hi, Let's say I want to run a map reduce job on a series of text files (let's say x.txt y.txt and z.txt) Given the following mapper function in python (from WordCount.py): class WordCountMap(Mapper, MapReduceBase): one = IntWritable(1) # removed def map(self, key, value, output, reporter): for w in value.toString().split(): output.collect(Text(w), self.one) #how can I modify this line? Instead of creating pairs for each word found and the numeral one as the example is doing, is there a function I can invoke to store the name of the file it came from instead? thus, i'd have pairs like <"water", "x.txt" > <"hadoop", y.txt > <"hadoop", "z.txt" > etc. I took a look at javadoc, but i'm not sure if I've checked in the right places. Could someone point me in the right direction? Thanks! -SM
