Actually, there is a RowCounter under the mapred package. There is a bug in the 0.2.0 candidate release but this was fixed yesterday. You may wwant to check the new one (see https://issues.apache.org/jira/browse/HBASE-791)
I would have done so but I probably have a bigger hdfs problem on my cluster :-) Thanks -Yair -----Original Message----- From: Pavel [mailto:[EMAIL PROTECTED] Sent: Friday, August 01, 2008 8:37 AM To: [email protected] Subject: Re: help with reduce phase understanding Thank you a lot for your answer Jean-Daniel. I think now I understand how that scenario works. I have another scenario (probably not doable with mapred thing though) - I need to get total rows count for whole table. I think I could use Reporter to increment a counter in map phase, but how can I get the counter value saved into 'results' table after all? Can you please advice how can I achieve that? Also, what is preferred way to get table row count? Thank you for your help! Pavel 2008/8/1 Jean-Daniel Cryans <[EMAIL PROTECTED]> > Pavel, > > Since each map processes only one region, that a row is only stored in one > region and that all intermediate keys from a given mapper goes to a single > reducer, there will be no stale data in this situation. > > J-D > > On Wed, Jul 30, 2008 at 10:09 AM, Pavel <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I feel lack of mapreduce approach understanding and would like to ask > some > > questions (mainly on its reduce part). Below is reduce job that gets > values > > count for given row key and inserts resulting value into other table > using > > the same row key. > > > > What makes me doubt is that I cannot figure out how would that code work > if > > there're several redurers are running. Is it possible that they will > > process > > values for same row key and as consequence write stale data into the > table? > > Say reducerA has counted total for 5 messages while reducerB for 3 > > messages, > > would that all end up with 8 value in resulting table? > > > > Thank you. > > Pavel > > > > public class MessagesTableReduce extends TableReduce<Text, LongWritable> > { > > > > public void reduce(Text key, Iterator<LongWritable> values, > > OutputCollector<Text, MapWritable> output, Reporter reporter) > > throws IOException { > > > > System.out.println("REDUCE: processing messages for author: " + > > key.toString()); > > > > int total = 0; > > while (values.hasNext()) { > > values.next(); > > total++; > > } > > > > MapWritable map = new MapWritable(); > > map.put(new Text("messages:sent"), new > > ImmutableBytesWritable(String.valueOf(total).getBytes())); > > output.collect(key, map); > > } > > } > > >
