RE: help with reduce phase understanding

Yair Even-Zohar Fri, 01 Aug 2008 06:41:29 -0700

Actually, there is a RowCounter under the mapred package. There is a bug
in the 0.2.0 candidate release but this was fixed yesterday. You may
wwant to check the new one (see
https://issues.apache.org/jira/browse/HBASE-791)


I would have done so but I probably have a bigger hdfs problem on my
cluster :-)

Thanks
-Yair
-----Original Message-----
From: Pavel [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 01, 2008 8:37 AM
To: [email protected]
Subject: Re: help with reduce phase understanding

Thank you a lot for your answer Jean-Daniel. I think now I understand
how
that scenario works.

I have another scenario (probably not doable with mapred thing though) -
I
need to get total rows count for whole table. I think I could use
Reporter
to increment a counter in map phase, but how can I get the counter value
saved into 'results' table after all? Can you please advice how can I
achieve that? Also, what is preferred way to get table row count?

Thank you for your help!
Pavel

2008/8/1 Jean-Daniel Cryans <[EMAIL PROTECTED]>

> Pavel,
>
> Since each map processes only one region, that a row is only stored in
one
> region and that all intermediate keys from a given mapper goes to a
single
> reducer, there will be no stale data in this situation.
>
> J-D
>
> On Wed, Jul 30, 2008 at 10:09 AM, Pavel <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I feel lack of mapreduce approach understanding and would like to
ask
> some
> > questions (mainly on its reduce part). Below is reduce job that gets
> values
> > count for given row key and inserts resulting value into other table
> using
> > the same row key.
> >
> > What makes me doubt is that I cannot figure out how would that code
work
> if
> > there're several redurers are running. Is it possible that they will
> > process
> > values for same row key and as consequence write stale data into the
> table?
> > Say reducerA has counted total for 5 messages while reducerB for 3
> > messages,
> > would that all end up with 8 value in resulting table?
> >
> > Thank you.
> > Pavel
> >
> > public class MessagesTableReduce extends TableReduce<Text,
LongWritable>
> {
> >
> >    public void reduce(Text key, Iterator<LongWritable> values,
> >            OutputCollector<Text, MapWritable> output, Reporter
reporter)
> >            throws IOException {
> >
> >        System.out.println("REDUCE: processing messages for author: "
+
> > key.toString());
> >
> >        int total = 0;
> >        while (values.hasNext()) {
> >            values.next();
> >            total++;
> >        }
> >
> >        MapWritable map = new MapWritable();
> >        map.put(new Text("messages:sent"), new
> > ImmutableBytesWritable(String.valueOf(total).getBytes()));
> >        output.collect(key, map);
> >    }
> > }
> >
>

RE: help with reduce phase understanding

Reply via email to