to mapreduce or not to mapreduce?

Pavel Lysov Tue, 29 Jul 2008 09:52:41 -0700

Hey all!

I'd like to ask you to take a look at the stuff I have and advice isit right direction to proceed it with map reduce approach?

There's MESSAGES table, each message has sender and recipient. Itworks nice so far and next I want to get the following info:


Total of messages user X has sent
Total of messages user X has received
Total of messages in the system

It would be USERS table, with USER_ID as row key and with 'messages'column family:

  messages:total_sent 345
  messages:total_received 543

Similar to the above, I'd create SYSTEM table with 'messages:total'column that will hold the total count of messages.

Next I think I should implement map reduce job that will update'messages:total_sent/total_received' for every user by adding one tooutput collector for given user id. Next, in reduce, I'll sum them upand update the user's row. Is it good idea to do like that? Could itcause any probs if more than two concurrent reduce jobs will try toupdate the same row?

Similar question for SYSTEM table, suppose there a bunch of reducejobs that try to update messages:total column at the same time?

I think table locks would help there but it seems I am missing somebasis understanding of how that all is supposed to work. Could youplease advice?


I appreciate your help!
Pavel Lysov
[EMAIL PROTECTED]

to mapreduce or not to mapreduce?

Reply via email to