Re: Map Reduce over HBase - sample code

stack Tue, 24 Jun 2008 07:58:04 -0700

Naama Kraus wrote:

..
What if the mission was the following - for each course in the table,
calculate the average grade in that course. In that case both map and reduce
are required, is that correct ? Map will emit for each row a {course_name,
grade} pair. Reduce will emit the average grades for each course
(course_name, avg_grade}. Output can be put in a separate table (probably
one holding courses information). Does this make sense ?

That'll work.

* At a higher level, I'd suggest a refactoring.  Do all of your work in
the map phase.  Have no reduce phase.  I suggest this because as is, all
rows emitted by the map are being sorted by the MR framework.  But hbase
will also do a sort on insert.   Avoid paying the prices of the MR sort.  Do
your calculation in the map and then insert the result at map time.   Either
emit nothing or, emit a '1' for every row processed so the MR counters tell
a story about your MR job.*


That's an interesting point. So if both map and reduce are a required, then
two sorts must take place. Is that correct ?

Yes but with your new example, they are orthogonal toward differentends; the first does collecting together all course data and the secondorders courses in hbase lexicographically (presuming course is primary key).


St.Ack

Re: Map Reduce over HBase - sample code

Reply via email to