It was committed late last night so it's fixed in TRUNK. Another big issue got fixed so there is a good chance that we see a release candidate 2 soon.
Pavel, FYI, doing a row count is really non-trivial in HBase. Doing a scan over all rows may take more than one hour because it's not distributed (it's one row after the other). So mapred is well suited for that. J-D On Fri, Aug 1, 2008 at 9:40 AM, Yair Even-Zohar <[EMAIL PROTECTED]>wrote: > Actually, there is a RowCounter under the mapred package. There is a bug > in the 0.2.0 candidate release but this was fixed yesterday. You may > wwant to check the new one (see > https://issues.apache.org/jira/browse/HBASE-791) > > I would have done so but I probably have a bigger hdfs problem on my > cluster :-) > > Thanks > -Yair > -----Original Message----- > From: Pavel [mailto:[EMAIL PROTECTED] > Sent: Friday, August 01, 2008 8:37 AM > To: [email protected] > Subject: Re: help with reduce phase understanding > > Thank you a lot for your answer Jean-Daniel. I think now I understand > how > that scenario works. > > I have another scenario (probably not doable with mapred thing though) - > I > need to get total rows count for whole table. I think I could use > Reporter > to increment a counter in map phase, but how can I get the counter value > saved into 'results' table after all? Can you please advice how can I > achieve that? Also, what is preferred way to get table row count? > > Thank you for your help! > Pavel > > 2008/8/1 Jean-Daniel Cryans <[EMAIL PROTECTED]> > > > Pavel, > > > > Since each map processes only one region, that a row is only stored in > one > > region and that all intermediate keys from a given mapper goes to a > single > > reducer, there will be no stale data in this situation. > > > > J-D > > > > On Wed, Jul 30, 2008 at 10:09 AM, Pavel <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I feel lack of mapreduce approach understanding and would like to > ask > > some > > > questions (mainly on its reduce part). Below is reduce job that gets > > values > > > count for given row key and inserts resulting value into other table > > using > > > the same row key. > > > > > > What makes me doubt is that I cannot figure out how would that code > work > > if > > > there're several redurers are running. Is it possible that they will > > > process > > > values for same row key and as consequence write stale data into the > > table? > > > Say reducerA has counted total for 5 messages while reducerB for 3 > > > messages, > > > would that all end up with 8 value in resulting table? > > > > > > Thank you. > > > Pavel > > > > > > public class MessagesTableReduce extends TableReduce<Text, > LongWritable> > > { > > > > > > public void reduce(Text key, Iterator<LongWritable> values, > > > OutputCollector<Text, MapWritable> output, Reporter > reporter) > > > throws IOException { > > > > > > System.out.println("REDUCE: processing messages for author: " > + > > > key.toString()); > > > > > > int total = 0; > > > while (values.hasNext()) { > > > values.next(); > > > total++; > > > } > > > > > > MapWritable map = new MapWritable(); > > > map.put(new Text("messages:sent"), new > > > ImmutableBytesWritable(String.valueOf(total).getBytes())); > > > output.collect(key, map); > > > } > > > } > > > > > >
