Thanks again, Naama

On Tue, Jun 24, 2008 at 5:57 PM, stack <[EMAIL PROTECTED]> wrote:

> Naama Kraus wrote:
>
>> ..
>> What if the mission was the following - for each course in the table,
>> calculate the average grade in that course. In that case both map and
>> reduce
>> are required, is that correct ? Map will emit for each row a {course_name,
>> grade} pair. Reduce will emit the average grades for each course
>> (course_name, avg_grade}. Output can be put in a separate table (probably
>> one holding courses information). Does this make sense ?
>>
>>
>>
>>
> That'll work.
>
>  * At a higher level, I'd suggest a refactoring.  Do all of your work in
>>> the map phase.  Have no reduce phase.  I suggest this because as is, all
>>> rows emitted by the map are being sorted by the MR framework.  But hbase
>>> will also do a sort on insert.   Avoid paying the prices of the MR sort.
>>>  Do
>>> your calculation in the map and then insert the result at map time.
>>> Either
>>> emit nothing or, emit a '1' for every row processed so the MR counters
>>> tell
>>> a story about your MR job.*
>>>
>>>
>>>
>>
>> That's an interesting point. So if both map and reduce are a required,
>> then
>> two sorts must take place. Is that correct ?
>>
>>
> Yes but with your new example, they are orthogonal toward different ends;
> the first does collecting together all course data and the second orders
> courses in hbase lexicographically (presuming course is primary key).
>
> St.Ack
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Reply via email to