Thanks Ryan.  You are right it is very much like word count.  Here is what I
have:

private final static IntWritable one = new IntWritable(1);

MAPPER
=====================================================
                @Override
                public void map(
                                ImmutableBytesWritable key, 
                                RowResult row,
                                OutputCollector<ImmutableBytesWritable, 
IntWritable> collector,
                                Reporter r) throws IOException {
                        //extract entity trims TYPE|VALUE|ID to just TYPE|VALUE
                        collector.collect(new 
ImmutableBytesWritable(extractEntity(key.get())),
one);
                }


REDUCER
=====================================================

        public static class Reducer extends MapReduceBase implements
                        TableReduce<ImmutableBytesWritable, IntWritable> {
                
                @Override
                public void reduce(ImmutableBytesWritable k,
                                Iterator<IntWritable> v,
                                OutputCollector<ImmutableBytesWritable, 
BatchUpdate> c,
                                Reporter r) throws IOException {
                        
                        BatchUpdate bu = new BatchUpdate(k.get());
                int sum = 0;
                while (v.hasNext()) {
                  sum += v.next().get();
                }
                bu.put("count:"+sum, String.valueOf(sum).getBytes());
                c.collect(k, bu);
         }

=====================================================

            TableMapReduceUtil.initTableMapJob(inputTableName, "colFam:",
Mapper.class,
                      ImmutableBytesWritable.class, IntWritable.class, c);
            
            TableMapReduceUtil.initTableReduceJob("output_count_table",
Reducer.class, c );


======================================================

The input table has just one column family, which isn't even necessary.  The
output table also has just one column family 'count'.  The goal is to put a
single entry in the output table along with the occurrence count.  So the
input table has row keys like TYPE|VALUE|1, TYPE|VALUE|2, etc (with possibly
millions), and the output table should have row key TYPE|VALUE, and value 2.  
The problem I'm having is I don't get the correct count, it's close but not
correct.  Is there something I'm doing incorrectly above? 

I'm open to any suggestions. Thanks.


Ryan Rawson wrote:
> 
> This looks like a variant of word count.
> 
> In the map you filter out the rows you are interested in, and emit
> "ELECTRONICS|TV" as the key and just about anything as the value.  The in
> the reduce you count how many values there are, then do the batch update
> as
> you have below.
> 
> 
> 
> On Fri, Jun 12, 2009 at 10:04 AM, llpind <[email protected]> wrote:
> 
>>
>> I believe my map is collecting per row correcly, but reduce doesn't seem
>> to
>> be doing anything:
>> =============================================================
>>
>>                private RowResult previousRow = null;  //keep previous row
>>                private int counter = 0;  //counter for adding up like
>> TYPE|VALUE
>>                 @Override
>>                public void reduce(ImmutableBytesWritable k,
>>                                Iterator<RowResult> v,
>>                                OutputCollector<ImmutableBytesWritable,
>> BatchUpdate> c,
>>                                Reporter r) throws IOException {
>>
>>                         //keep counter for equal entities. row takes
>> TYPE|VALUE|LINKID form
>>                        while (v.hasNext()){
>>                                RowResult currentRow = v.next();
>>                                if (previousRow == null){
>>                                        previousRow = currentRow;
>>                                }
>>                                if
>> (extractEntity(currentRow).equals(extractEntity(previousRow))){
>>                                        ++counter;
>>                                }else{
>>                                        //commit previous row size, set
>> previous to current & reset counter
>>                                        BatchUpdate bu = new
>> BatchUpdate(extractEntity(previousRow));
>>                                        bu.put("count:"+counter,
>> String.valueOf(counter).getBytes());
>>                                        c.collect(new
>> ImmutableBytesWritable(previousRow.getRow()), bu);
>>                                        previousRow = currentRow;
>>                                        counter = 0;
>>
>>                                }
>>
>>                        }
>>
>> ==============================================
>> What am I doing wrong?
>>
>> The extract is simiply getting the TYPE|VALUE only.
>>
>> what excatly do I have in the Iterator<RowResult>  at this point?
>>
>> Thanks
>>
>> llpind wrote:
>> >
>> > If i have a tall table, what is returned in the reduce?   I'm still
>> > confused as to how things map up.
>> >
>> > for example assume I have ELECTRONICS|TV|ID2343 as the row key. There
>> are
>> > millions of these (ELECTRONICS|TV|ID234324, along with other products).
>> > I'd like to count the total # of IDs for all TVs.  How do I do this
>> with
>> > map/reduce?  I tried a few things, but not able to get it working.
>> >
>> >
>> >
>> > Ryan Rawson wrote:
>> >>
>> >> Also remember you might be able to convert to a tall table. Row keys
>> can
>> >> be
>> >> compound and you can do partial left matches on them. Eg:
>> >>
>> >> Userid:timestamp:eventid
>> >>
>> >> now you have a tall table. Do prefix matches on the userid you want
>> and
>> >> you
>> >> get results in chronological order.
>> >>
>> >> You can build equivalent indexes in hbase as in sql. You may find a
>> >> design
>> >> like this alieviates the need for extremely wide rows.
>> >>
>> >> Good luck!
>> >>
>> >> On Jun 11, 2009 11:44 AM, "Billy Pearson" <[email protected]>
>> >> wrote:
>> >>
>> >> That might be a good idea but you might be able to redesign you layout
>> of
>> >> the table
>> >> using a different key then the current one worth barnstorming.
>> >>
>> >> Billy
>> >>
>> >>
>> >>
>> >> "llpind" <[email protected]> wrote in message
>> >> news:[email protected]...
>> >>
>> >> Sorry I forgot to mention the overflow then overflows into new row
>> keys
>> >> per
>> >> 10,000 column entries ...
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24042336.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to