Oh-ha, that's simple. :)
/Edward J. Yoon
On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> this is a well known problem. basically, you want to aggregate values
> computed at some previous step.
>
> --emit <category,probability> pairs and have the reducer simply sum-up
> the probabilities for a given category
>
> (it is the same task as summing-up the word counts)
>
> Miles
>
> 2008/10/7 Edward J. Yoon <[EMAIL PROTECTED]>:
>> I would like to get the spam probability P(word|category) of the words
>> from an files of category (bad/good e-mails) as describe below. BTW,
>> To computes it on reduce, I need a sum of "spamTotal" between map
>> tasks. How can i get it?
>>
>> Map:
>>
>> /**
>> * Counts word frequency
>> */
>> public void map(LongWritable key, Text value,
>> OutputCollector<Text, FloatWritable> output, Reporter reporter)
>> throws IOException {
>> String line = value.toString();
>> String[] tokens = line.split(splitregex);
>>
>> // For every word token
>> for (int i = 0; i < tokens.length; i++) {
>> String word = tokens[i].toLowerCase();
>> Matcher m = wordregex.matcher(word);
>> if (m.matches()) {
>> spamTotal++;
>> output.collect(new Text(word), count);
>> }
>> }
>> }
>> }
>>
>> Reduce:
>>
>> /**
>> * Computes bad count / total bad words
>> */
>> public static class Reduce extends MapReduceBase implements
>> Reducer<Text, FloatWritable, Text, FloatWritable> {
>>
>> public void reduce(Text key, Iterator<FloatWritable> values,
>> OutputCollector<Text, FloatWritable> output, Reporter reporter)
>> throws IOException {
>> int sum = 0;
>> while (values.hasNext()) {
>> sum += (int) values.next().get();
>> }
>>
>> FloatWritable badProb = new FloatWritable((float) sum / spamTotal);
>> output.collect(key, badProb);
>> }
>> }
>>
>>
>> --
>> Best regards, Edward J. Yoon
>> [EMAIL PROTECTED]
>> http://blog.udanax.org
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
--
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org