Raghava,
               What you ask should be easily doable using HBase specific input 
and output format and I believe they already exist in a form in which you can 
use them with minimal or no modification. For specifics of the format classes 
check out Hbase's API documentation. Also you are better off posting Hbase 
specific questions on Hbase user group mailing list.

Regards
-...@nkur


On 2/15/10 1:35 PM, "Raghava Mutharaju" <[email protected]> wrote:

Hello all,

      I am relatively new to MapReduce and haven't used HBase at all. Is the
following architecture possible?

A distributed key-value store is used (HBase). So along with values, there
would be a timestamp associated with the values. Map & Reduce tasks are
executed iteratively. Map, in each iteration should take in values which
were added in the previous iteration to the store (perhaps the ones with
latest timestamp?). Reduce should take in Map's output as well as the
<key,value> pairs from the store whose key(s) match the key(s) that reduce
has to process in the current iteration. The output of reduce goes to the
store.

If this is possible, which classes (eg: InputFormat, run() of Reduce) should
be extended so that instead of the regular operation the above operation
takes place. If this is not possible, are there any alternatives to achieve
the same?

Thank you.

PS: I have put the same question on mapreduce-user apache mailing list (but
haven't got any replies yet). I found many topics on mapreduce in this
mailing list as well, so thought of posting it here also.

Regards,
Raghava.

Reply via email to