where i can find the Reverse-Index application?
heyongqiang 2008-07-09 发件人: Shengkai Zhu 发送时间: 2008-07-09 09:06:38 收件人: [email protected] 抄送: 主题: Re: modified word count example Another Map Reduce application, Reverse-Index, behaviors similarly as you description. You can refer to that. On 7/9/08, heyongqiang <[EMAIL PROTECTED] > wrote: > > InputFormat's method RecordReader <K, V > getRecordReader(InputSplit split, > JobConf job, Reporter reporter) throws IOException; return a RecordReader. > You can implement your own InputFormat and RecordReader: > 1)the RecorderReader remember the FileSplit(subclass of InputSplit) field > in its class > 2) RecordReader's createValue() method always return the FileSplit's file > field. > > hope this helps. > > > > heyongqiang > 2008-07-09 > > > > 发件人: Sandy > 发送时间: 2008-07-09 01:45:15 > 收件人: [email protected] > 抄送: > 主题: modified word count example > > Hi, > > Let's say I want to run a map reduce job on a series of text files (let's > say x.txt y.txt and z.txt) > > Given the following mapper function in python (from WordCount.py): > > class WordCountMap(Mapper, MapReduceBase): > one = IntWritable(1) # removed > def map(self, key, value, output, reporter): > for w in value.toString().split(): > output.collect(Text(w), self.one) #how can I modify this line? > > Instead of creating pairs for each word found and the numeral one as the > example is doing, is there a function I can invoke to store the name of the > file it came from instead? > > thus, i'd have pairs like <"water", "x.txt" > <"hadoop", y.txt > > <"hadoop", > "z.txt" > etc. > > I took a look at javadoc, but i'm not sure if I've checked in the right > places. Could someone point me in the right direction? > > Thanks! > > -SM >
