The javadoc<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html>includes a fair amount of information. There is also some tests<http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.20/src/test/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java>in the HBase codebase...
If you haven't tried map reduce before then I'd suggest starting at: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Cheers, Dan On 15 February 2010 14:44, Sujee Maniyam <su...@sujee.net> wrote: > A few hundred million rows for now, and will be more in the future. > > map-reduce proposal sounds very interesting. Any pointers on running > MR jobs on data stored in Hbase? > > thanks very much > sujee > > > > On Sun, Feb 14, 2010 at 2:29 PM, Dan Washusen <d...@reactive.org> wrote: > > Hi Sujee, > > How much data do you have in your table? Keeping a count in memory has > it's > > obvious problems but if it's a small table then I guess it would work... > > > > How fast do you need to get this information? Maybe a map reduce job > would > > be a better way of doing it? > > > > Cheers, > > Dan > > > > > > On 14 February 2010 19:56, Sujee Maniyam <su...@sujee.net> wrote: > > > >> HI > >> > >> I have a table with rowkey is composed of userid + timestamp. I need > >> to figure out 'top-100' users. > >> > >> One approach is running a scanner and keeping a hashmap of user-count in > >> memory. > >> > >> Wondering if there is an hbase-trick I could use? > >> > >> thanks > >> Sujee > >> > > >