Hey all!

I'd like to ask you to take a look at the stuff I have and advice is it right direction to proceed it with map reduce approach?

There's MESSAGES table, each message has sender and recipient. It works nice so far and next I want to get the following info:

Total of messages user X has sent
Total of messages user X has received
Total of messages in the system

It would be USERS table, with USER_ID as row key and with 'messages' column family:
  messages:total_sent 345
  messages:total_received 543

Similar to the above, I'd create SYSTEM table with 'messages:total' column that will hold the total count of messages.

Next I think I should implement map reduce job that will update 'messages:total_sent/total_received' for every user by adding one to output collector for given user id. Next, in reduce, I'll sum them up and update the user's row. Is it good idea to do like that? Could it cause any probs if more than two concurrent reduce jobs will try to update the same row?

Similar question for SYSTEM table, suppose there a bunch of reduce jobs that try to update messages:total column at the same time?

I think table locks would help there but it seems I am missing some basis understanding of how that all is supposed to work. Could you please advice?

I appreciate your help!
Pavel Lysov
[EMAIL PROTECTED]



Reply via email to