This looks like a variant of word count.

In the map you filter out the rows you are interested in, and emit
"ELECTRONICS|TV" as the key and just about anything as the value.  The in
the reduce you count how many values there are, then do the batch update as
you have below.



On Fri, Jun 12, 2009 at 10:04 AM, llpind <[email protected]> wrote:

>
> I believe my map is collecting per row correcly, but reduce doesn't seem to
> be doing anything:
> =============================================================
>
>                private RowResult previousRow = null;  //keep previous row
>                private int counter = 0;  //counter for adding up like
> TYPE|VALUE
>                 @Override
>                public void reduce(ImmutableBytesWritable k,
>                                Iterator<RowResult> v,
>                                OutputCollector<ImmutableBytesWritable,
> BatchUpdate> c,
>                                Reporter r) throws IOException {
>
>                         //keep counter for equal entities. row takes
> TYPE|VALUE|LINKID form
>                        while (v.hasNext()){
>                                RowResult currentRow = v.next();
>                                if (previousRow == null){
>                                        previousRow = currentRow;
>                                }
>                                if
> (extractEntity(currentRow).equals(extractEntity(previousRow))){
>                                        ++counter;
>                                }else{
>                                        //commit previous row size, set
> previous to current & reset counter
>                                        BatchUpdate bu = new
> BatchUpdate(extractEntity(previousRow));
>                                        bu.put("count:"+counter,
> String.valueOf(counter).getBytes());
>                                        c.collect(new
> ImmutableBytesWritable(previousRow.getRow()), bu);
>                                        previousRow = currentRow;
>                                        counter = 0;
>
>                                }
>
>                        }
>
> ==============================================
> What am I doing wrong?
>
> The extract is simiply getting the TYPE|VALUE only.
>
> what excatly do I have in the Iterator<RowResult>  at this point?
>
> Thanks
>
> llpind wrote:
> >
> > If i have a tall table, what is returned in the reduce?   I'm still
> > confused as to how things map up.
> >
> > for example assume I have ELECTRONICS|TV|ID2343 as the row key. There are
> > millions of these (ELECTRONICS|TV|ID234324, along with other products).
> > I'd like to count the total # of IDs for all TVs.  How do I do this with
> > map/reduce?  I tried a few things, but not able to get it working.
> >
> >
> >
> > Ryan Rawson wrote:
> >>
> >> Also remember you might be able to convert to a tall table. Row keys can
> >> be
> >> compound and you can do partial left matches on them. Eg:
> >>
> >> Userid:timestamp:eventid
> >>
> >> now you have a tall table. Do prefix matches on the userid you want and
> >> you
> >> get results in chronological order.
> >>
> >> You can build equivalent indexes in hbase as in sql. You may find a
> >> design
> >> like this alieviates the need for extremely wide rows.
> >>
> >> Good luck!
> >>
> >> On Jun 11, 2009 11:44 AM, "Billy Pearson" <[email protected]>
> >> wrote:
> >>
> >> That might be a good idea but you might be able to redesign you layout
> of
> >> the table
> >> using a different key then the current one worth barnstorming.
> >>
> >> Billy
> >>
> >>
> >>
> >> "llpind" <[email protected]> wrote in message
> >> news:[email protected]...
> >>
> >> Sorry I forgot to mention the overflow then overflows into new row keys
> >> per
> >> 10,000 column entries ...
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Reply via email to