Re: Re: modified word count example

heyongqiang Tue, 08 Jul 2008 19:09:12 -0700

where i can find the Reverse-Index application?




heyongqiang
2008-07-09



发件人： Shengkai Zhu
发送时间： 2008-07-09 09:06:38
收件人： [email protected]
抄送： 
主题： Re: modified word count example

Another Map Reduce application, Reverse-Index, behaviors similarly as you
description.
You can refer to that.


On 7/9/08, heyongqiang  <[EMAIL PROTECTED] > wrote:
>
> InputFormat's method RecordReader <K, V > getRecordReader(InputSplit split,
> JobConf job, Reporter reporter) throws IOException; return a RecordReader.
> You can implement your own InputFormat and RecordReader:
> 1)the RecorderReader remember the FileSplit(subclass of InputSplit) field
> in its class
> 2) RecordReader's createValue() method always return the FileSplit's file
> field.
>
> hope this helps.
>
>
>
> heyongqiang
> 2008-07-09
>
>
>
> 发件人： Sandy
> 发送时间： 2008-07-09 01:45:15
> 收件人： [email protected]
> 抄送：
> 主题： modified word count example
>
> Hi,
>
> Let's say I want to run a map reduce job on a series of text files (let's
> say x.txt y.txt and z.txt)
>
> Given the following mapper function in python (from WordCount.py):
>
> class WordCountMap(Mapper, MapReduceBase):
>    one = IntWritable(1) # removed
>    def map(self, key, value, output, reporter):
>        for w in value.toString().split():
>            output.collect(Text(w), self.one) #how can I modify this line?
>
> Instead of creating pairs for each word found and the numeral one as the
> example is doing, is there a function I can invoke to store the name of the
> file it came from instead?
>
> thus, i'd have pairs like   <"water", "x.txt"  >   <"hadoop", y.txt  >
>  <"hadoop",
> "z.txt"  > etc.
>
> I took a look at javadoc, but i'm not sure if I've checked in the right
> places. Could someone point me in the right direction?
>
> Thanks!
>
> -SM
>

Re: Re: modified word count example

Reply via email to