Thanks Ryan. You are right it is very much like word count. Here is what I
have:
private final static IntWritable one = new IntWritable(1);
MAPPER
=====================================================
@Override
public void map(
ImmutableBytesWritable key,
RowResult row,
OutputCollector<ImmutableBytesWritable,
IntWritable> collector,
Reporter r) throws IOException {
//extract entity trims TYPE|VALUE|ID to just TYPE|VALUE
collector.collect(new
ImmutableBytesWritable(extractEntity(key.get())),
one);
}
REDUCER
=====================================================
public static class Reducer extends MapReduceBase implements
TableReduce<ImmutableBytesWritable, IntWritable> {
@Override
public void reduce(ImmutableBytesWritable k,
Iterator<IntWritable> v,
OutputCollector<ImmutableBytesWritable,
BatchUpdate> c,
Reporter r) throws IOException {
BatchUpdate bu = new BatchUpdate(k.get());
int sum = 0;
while (v.hasNext()) {
sum += v.next().get();
}
bu.put("count:"+sum, String.valueOf(sum).getBytes());
c.collect(k, bu);
}
=====================================================
TableMapReduceUtil.initTableMapJob(inputTableName, "colFam:",
Mapper.class,
ImmutableBytesWritable.class, IntWritable.class, c);
TableMapReduceUtil.initTableReduceJob("output_count_table",
Reducer.class, c );
======================================================
The input table has just one column family, which isn't even necessary. The
output table also has just one column family 'count'. The goal is to put a
single entry in the output table along with the occurrence count. So the
input table has row keys like TYPE|VALUE|1, TYPE|VALUE|2, etc (with possibly
millions), and the output table should have row key TYPE|VALUE, and value 2.
The problem I'm having is I don't get the correct count, it's close but not
correct. Is there something I'm doing incorrectly above?
I'm open to any suggestions. Thanks.
Ryan Rawson wrote:
>
> This looks like a variant of word count.
>
> In the map you filter out the rows you are interested in, and emit
> "ELECTRONICS|TV" as the key and just about anything as the value. The in
> the reduce you count how many values there are, then do the batch update
> as
> you have below.
>
>
>
> On Fri, Jun 12, 2009 at 10:04 AM, llpind <[email protected]> wrote:
>
>>
>> I believe my map is collecting per row correcly, but reduce doesn't seem
>> to
>> be doing anything:
>> =============================================================
>>
>> private RowResult previousRow = null; //keep previous row
>> private int counter = 0; //counter for adding up like
>> TYPE|VALUE
>> @Override
>> public void reduce(ImmutableBytesWritable k,
>> Iterator<RowResult> v,
>> OutputCollector<ImmutableBytesWritable,
>> BatchUpdate> c,
>> Reporter r) throws IOException {
>>
>> //keep counter for equal entities. row takes
>> TYPE|VALUE|LINKID form
>> while (v.hasNext()){
>> RowResult currentRow = v.next();
>> if (previousRow == null){
>> previousRow = currentRow;
>> }
>> if
>> (extractEntity(currentRow).equals(extractEntity(previousRow))){
>> ++counter;
>> }else{
>> //commit previous row size, set
>> previous to current & reset counter
>> BatchUpdate bu = new
>> BatchUpdate(extractEntity(previousRow));
>> bu.put("count:"+counter,
>> String.valueOf(counter).getBytes());
>> c.collect(new
>> ImmutableBytesWritable(previousRow.getRow()), bu);
>> previousRow = currentRow;
>> counter = 0;
>>
>> }
>>
>> }
>>
>> ==============================================
>> What am I doing wrong?
>>
>> The extract is simiply getting the TYPE|VALUE only.
>>
>> what excatly do I have in the Iterator<RowResult> at this point?
>>
>> Thanks
>>
>> llpind wrote:
>> >
>> > If i have a tall table, what is returned in the reduce? I'm still
>> > confused as to how things map up.
>> >
>> > for example assume I have ELECTRONICS|TV|ID2343 as the row key. There
>> are
>> > millions of these (ELECTRONICS|TV|ID234324, along with other products).
>> > I'd like to count the total # of IDs for all TVs. How do I do this
>> with
>> > map/reduce? I tried a few things, but not able to get it working.
>> >
>> >
>> >
>> > Ryan Rawson wrote:
>> >>
>> >> Also remember you might be able to convert to a tall table. Row keys
>> can
>> >> be
>> >> compound and you can do partial left matches on them. Eg:
>> >>
>> >> Userid:timestamp:eventid
>> >>
>> >> now you have a tall table. Do prefix matches on the userid you want
>> and
>> >> you
>> >> get results in chronological order.
>> >>
>> >> You can build equivalent indexes in hbase as in sql. You may find a
>> >> design
>> >> like this alieviates the need for extremely wide rows.
>> >>
>> >> Good luck!
>> >>
>> >> On Jun 11, 2009 11:44 AM, "Billy Pearson" <[email protected]>
>> >> wrote:
>> >>
>> >> That might be a good idea but you might be able to redesign you layout
>> of
>> >> the table
>> >> using a different key then the current one worth barnstorming.
>> >>
>> >> Billy
>> >>
>> >>
>> >>
>> >> "llpind" <[email protected]> wrote in message
>> >> news:[email protected]...
>> >>
>> >> Sorry I forgot to mention the overflow then overflows into new row
>> keys
>> >> per
>> >> 10,000 column entries ...
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>
>
--
View this message in context:
http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24042336.html
Sent from the HBase User mailing list archive at Nabble.com.