Re: help with reduce phase understanding

Jean-Daniel Cryans Thu, 31 Jul 2008 18:17:47 -0700

Pavel,

Since each map processes only one region, that a row is only stored in one
region and that all intermediate keys from a given mapper goes to a single
reducer, there will be no stale data in this situation.


J-D

On Wed, Jul 30, 2008 at 10:09 AM, Pavel <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I feel lack of mapreduce approach understanding and would like to ask some
> questions (mainly on its reduce part). Below is reduce job that gets values
> count for given row key and inserts resulting value into other table using
> the same row key.
>
> What makes me doubt is that I cannot figure out how would that code work if
> there're several redurers are running. Is it possible that they will
> process
> values for same row key and as consequence write stale data into the table?
> Say reducerA has counted total for 5 messages while reducerB for 3
> messages,
> would that all end up with 8 value in resulting table?
>
> Thank you.
> Pavel
>
> public class MessagesTableReduce extends TableReduce<Text, LongWritable> {
>
>    public void reduce(Text key, Iterator<LongWritable> values,
>            OutputCollector<Text, MapWritable> output, Reporter reporter)
>            throws IOException {
>
>        System.out.println("REDUCE: processing messages for author: " +
> key.toString());
>
>        int total = 0;
>        while (values.hasNext()) {
>            values.next();
>            total++;
>        }
>
>        MapWritable map = new MapWritable();
>        map.put(new Text("messages:sent"), new
> ImmutableBytesWritable(String.valueOf(total).getBytes()));
>        output.collect(key, map);
>    }
> }
>

Re: help with reduce phase understanding

Reply via email to