Oh-ha, that's simple. :)

/Edward J. Yoon

On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> this is a well known problem.  basically, you want to aggregate values
> computed at some previous step.
>
> --emit <category,probability> pairs and have the reducer simply sum-up
> the probabilities for a given category
>
> (it is the same task as summing-up the word counts)
>
> Miles
>
> 2008/10/7 Edward J. Yoon <[EMAIL PROTECTED]>:
>> I would like to get the spam probability P(word|category) of the words
>> from an files of category (bad/good e-mails) as describe below. BTW,
>> To computes it on reduce, I need a sum of "spamTotal" between map
>> tasks. How can i get it?
>>
>> Map:
>>
>>    /**
>>     * Counts word frequency
>>     */
>>    public void map(LongWritable key, Text value,
>>        OutputCollector<Text, FloatWritable> output, Reporter reporter)
>>        throws IOException {
>>      String line = value.toString();
>>      String[] tokens = line.split(splitregex);
>>
>>      // For every word token
>>      for (int i = 0; i < tokens.length; i++) {
>>        String word = tokens[i].toLowerCase();
>>        Matcher m = wordregex.matcher(word);
>>        if (m.matches()) {
>>          spamTotal++;
>>          output.collect(new Text(word), count);
>>        }
>>      }
>>    }
>>  }
>>
>> Reduce:
>>
>>  /**
>>   * Computes bad count / total bad words
>>   */
>>  public static class Reduce extends MapReduceBase implements
>>      Reducer<Text, FloatWritable, Text, FloatWritable> {
>>
>>    public void reduce(Text key, Iterator<FloatWritable> values,
>>        OutputCollector<Text, FloatWritable> output, Reporter reporter)
>>        throws IOException {
>>      int sum = 0;
>>      while (values.hasNext()) {
>>        sum += (int) values.next().get();
>>      }
>>
>>      FloatWritable badProb = new FloatWritable((float) sum / spamTotal);
>>      output.collect(key, badProb);
>>    }
>>  }
>>
>>
>> --
>> Best regards, Edward J. Yoon
>> [EMAIL PROTECTED]
>> http://blog.udanax.org
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

Reply via email to